openstackgerrit | Merged openstack-infra/system-config master: Change openstack-dev to openstack-discuss https://review.openstack.org/625389 | 00:05 |
---|---|---|
*** slaweq has joined #openstack-infra | 00:11 | |
*** slaweq has quit IRC | 00:15 | |
*** wolverineav has joined #openstack-infra | 00:29 | |
*** wolverineav has quit IRC | 00:33 | |
*** dkehn has quit IRC | 01:29 | |
*** dkehn has joined #openstack-infra | 01:36 | |
*** bobh has joined #openstack-infra | 01:44 | |
*** wolverineav has joined #openstack-infra | 01:48 | |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests https://review.openstack.org/625457 | 01:49 |
*** oanson has quit IRC | 01:59 | |
*** dave-mccowan has joined #openstack-infra | 02:07 | |
*** slaweq has joined #openstack-infra | 02:11 | |
*** hongbin has joined #openstack-infra | 02:11 | |
*** slaweq has quit IRC | 02:16 | |
*** wolverineav has quit IRC | 02:18 | |
*** wolverineav has joined #openstack-infra | 02:25 | |
*** wolverineav has quit IRC | 02:25 | |
*** wolverineav has joined #openstack-infra | 02:25 | |
*** hongbin has quit IRC | 02:28 | |
*** bobh has quit IRC | 02:37 | |
*** dave-mccowan has quit IRC | 02:47 | |
*** bobh has joined #openstack-infra | 02:49 | |
*** hongbin has joined #openstack-infra | 02:50 | |
*** psachin has joined #openstack-infra | 02:58 | |
*** mrsoul has quit IRC | 03:02 | |
openstackgerrit | Ian Wienand proposed openstack-infra/project-config master: Add github dogpile.cache to project list https://review.openstack.org/625467 | 03:06 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests https://review.openstack.org/625457 | 03:07 |
*** dklyle has joined #openstack-infra | 03:20 | |
*** lbragstad has joined #openstack-infra | 03:26 | |
openstackgerrit | Merged openstack-infra/project-config master: add release jobs for git-os-job https://review.openstack.org/625273 | 03:33 |
openstackgerrit | Merged openstack-infra/project-config master: import openstack-summit-counter repository https://review.openstack.org/625292 | 03:33 |
*** bhavikdbavishi has joined #openstack-infra | 03:41 | |
*** ramishra has joined #openstack-infra | 03:42 | |
*** ykarel has joined #openstack-infra | 03:43 | |
*** udesale has joined #openstack-infra | 03:49 | |
*** armax has quit IRC | 03:57 | |
*** slaweq has joined #openstack-infra | 04:11 | |
*** bobh has quit IRC | 04:12 | |
*** slaweq has quit IRC | 04:16 | |
*** hongbin has quit IRC | 04:37 | |
*** ykarel has quit IRC | 04:40 | |
*** eernst has joined #openstack-infra | 04:49 | |
*** chandan_kumar is now known as chandankumar | 04:55 | |
*** ykarel has joined #openstack-infra | 05:01 | |
*** ykarel has quit IRC | 05:10 | |
*** ykarel has joined #openstack-infra | 05:12 | |
*** agopi has quit IRC | 05:15 | |
*** dklyle has quit IRC | 05:16 | |
*** wolverineav has quit IRC | 05:24 | |
*** wolverineav has joined #openstack-infra | 05:25 | |
*** wolverineav has quit IRC | 05:35 | |
*** janki has joined #openstack-infra | 05:36 | |
*** eernst has quit IRC | 05:36 | |
*** rcernin has joined #openstack-infra | 05:46 | |
*** markvoelker has joined #openstack-infra | 05:47 | |
*** rcernin has quit IRC | 05:47 | |
*** markvoelker has quit IRC | 05:51 | |
*** bhavikdbavishi1 has joined #openstack-infra | 05:59 | |
*** bhavikdbavishi has quit IRC | 06:01 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 06:01 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/625487 | 06:08 |
*** rcernin has joined #openstack-infra | 06:09 | |
*** rcernin has quit IRC | 06:09 | |
*** rcernin has joined #openstack-infra | 06:09 | |
*** rcernin has quit IRC | 06:09 | |
*** quiquell|off has quit IRC | 06:09 | |
*** slaweq has joined #openstack-infra | 06:11 | |
*** slaweq has quit IRC | 06:16 | |
openstackgerrit | Rico Lin proposed openstack-infra/irc-meetings master: Change Heat meeting schedule https://review.openstack.org/625493 | 06:20 |
*** wolverineav has joined #openstack-infra | 06:35 | |
*** wolverineav has quit IRC | 06:40 | |
*** AJaeger has quit IRC | 07:00 | |
openstackgerrit | Merged openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/625487 | 07:02 |
*** AJaeger has joined #openstack-infra | 07:09 | |
*** apetrich has joined #openstack-infra | 07:16 | |
openstackgerrit | M V P Nitesh proposed openstack/diskimage-builder master: Adding new dib element https://review.openstack.org/625501 | 07:19 |
*** quiquell has joined #openstack-infra | 07:24 | |
*** dpawlik has joined #openstack-infra | 07:26 | |
*** yboaron_ has quit IRC | 07:28 | |
*** jtomasek has joined #openstack-infra | 07:32 | |
*** jtomasek has quit IRC | 07:33 | |
*** jtomasek has joined #openstack-infra | 07:33 | |
*** Emine has joined #openstack-infra | 07:33 | |
*** oanson has joined #openstack-infra | 07:34 | |
*** e0ne has joined #openstack-infra | 07:35 | |
*** slaweq has joined #openstack-infra | 07:37 | |
*** e0ne has quit IRC | 07:39 | |
*** slaweq has quit IRC | 07:40 | |
*** jpena|off is now known as jpena | 07:42 | |
*** pgaxatte has joined #openstack-infra | 07:44 | |
*** slaweq has joined #openstack-infra | 07:44 | |
*** markvoelker has joined #openstack-infra | 07:48 | |
*** yolanda has joined #openstack-infra | 07:50 | |
*** jbadiapa has joined #openstack-infra | 07:57 | |
*** ykarel is now known as ykarel|lunch | 07:58 | |
*** rpittau has joined #openstack-infra | 07:58 | |
*** pcaruana has joined #openstack-infra | 08:18 | |
*** jpena is now known as jpena|away | 08:21 | |
*** witek_ is now known as witek | 08:27 | |
*** yboaron_ has joined #openstack-infra | 08:29 | |
*** yamamoto has quit IRC | 08:30 | |
*** yamamoto has joined #openstack-infra | 08:32 | |
*** yboaron_ has quit IRC | 08:33 | |
*** yboaron_ has joined #openstack-infra | 08:34 | |
*** witek has quit IRC | 08:35 | |
*** tosky has joined #openstack-infra | 08:36 | |
*** gfidente has joined #openstack-infra | 08:38 | |
*** ykarel|lunch is now known as ykarel | 08:39 | |
*** ccamacho has quit IRC | 08:42 | |
*** yamamoto has quit IRC | 08:46 | |
*** alexchadin has joined #openstack-infra | 08:47 | |
*** gibi has joined #openstack-infra | 08:52 | |
*** agopi has joined #openstack-infra | 09:01 | |
*** jpich has joined #openstack-infra | 09:02 | |
*** shardy has joined #openstack-infra | 09:04 | |
*** lpetrut has joined #openstack-infra | 09:09 | |
openstackgerrit | Wayne Chan proposed openstack/diskimage-builder master: Update mailinglist from dev to discuss https://review.openstack.org/625518 | 09:18 |
*** owalsh_ is now known as owalsh | 09:24 | |
*** yamamoto has joined #openstack-infra | 09:26 | |
*** tosky has quit IRC | 09:29 | |
*** ccamacho has joined #openstack-infra | 09:31 | |
odyssey4me | clarkb mnaser ah, that old chestnut - we did resolve it in the rocky cycle, but have not ported that fix back to older branches.... not sure we should either - what's going on there is that it tries to install the appropriate packages from the local mirror (a container on the host), then falls back to pypi if all the packages aren't on that local mirrror | 09:31 |
*** tosky has joined #openstack-infra | 09:33 | |
AJaeger | odyssey4me: if you don't backport it, then stop running the broken jobs... | 09:34 |
odyssey4me | clarkb mnaser Given that our approach in master/rocky seems to be successful - perhaps we can port it back. Let me propose it and see what the cores think about it. | 09:35 |
*** rossella_s has joined #openstack-infra | 09:35 | |
odyssey4me | AJaeger Stopping the jobs running means losing test coverage. The job is not broken, it is working as designed. It's just outputting some logs which appear to be interfering with something which hasn't been expressed. I'm happy to work with clarkb to get that resolved, but not happy to cut test coverage. | 09:36 |
*** bhavikdbavishi has quit IRC | 09:36 | |
AJaeger | odyssey4me: happy to see it adressed ;) | 09:38 |
*** yamamoto has quit IRC | 09:45 | |
*** yamamoto has joined #openstack-infra | 09:45 | |
*** ssbarnea|rover has joined #openstack-infra | 09:47 | |
*** rtjure has quit IRC | 09:52 | |
*** derekh has joined #openstack-infra | 09:58 | |
*** e0ne has joined #openstack-infra | 10:02 | |
*** yamamoto has quit IRC | 10:06 | |
*** ginopc has joined #openstack-infra | 10:08 | |
*** lbragstad has quit IRC | 10:10 | |
*** jpena|away is now known as jpena | 10:12 | |
*** xek has joined #openstack-infra | 10:16 | |
*** yamamoto has joined #openstack-infra | 10:18 | |
*** yboaron_ has quit IRC | 10:21 | |
*** yboaron_ has joined #openstack-infra | 10:22 | |
*** yamamoto has quit IRC | 10:23 | |
*** sambetts_ has joined #openstack-infra | 10:29 | |
*** pbourke has quit IRC | 10:34 | |
*** pbourke has joined #openstack-infra | 10:36 | |
*** aojea has joined #openstack-infra | 10:37 | |
*** markmcd has quit IRC | 10:44 | |
*** derekh has quit IRC | 10:46 | |
*** derekh has joined #openstack-infra | 10:47 | |
*** markmcd has joined #openstack-infra | 10:52 | |
*** electrofelix has joined #openstack-infra | 10:52 | |
*** yamamoto has joined #openstack-infra | 11:00 | |
*** ginopc has quit IRC | 11:01 | |
*** ginopc has joined #openstack-infra | 11:02 | |
*** udesale has quit IRC | 11:10 | |
mgoddard | hello infra team, we are looking at upgrading the version of Docker used in kolla-ansible. Doing so brings in a new constraint on the URL of the Docker registry mirror - it cannot contain a path. The registry mirror provided currently in CI has a path - /registry-1.docker/. How difficult would it be to configure the mirror to also support use without a path? Also being important here to avoid a hard | 11:17 |
mgoddard | break in existing jobs. Docker bug on the topic is at https://github.com/moby/moby/issues/36598. | 11:18 |
*** yboaron_ has quit IRC | 11:23 | |
*** bhavikdbavishi has joined #openstack-infra | 11:30 | |
*** e0ne has quit IRC | 11:31 | |
*** yboaron_ has joined #openstack-infra | 11:33 | |
*** rfolco has joined #openstack-infra | 11:37 | |
*** yamamoto has quit IRC | 11:38 | |
*** yamamoto has joined #openstack-infra | 11:38 | |
*** tpsilva has joined #openstack-infra | 11:42 | |
*** rtjure has joined #openstack-infra | 11:53 | |
*** dkehn has quit IRC | 11:57 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/elastic-recheck master: Identify *POST* timeout failures individually https://review.openstack.org/625573 | 12:03 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/elastic-recheck master: Identify *POST* timeout failures individually https://review.openstack.org/625573 | 12:03 |
*** rpittau is now known as rpittau|lunch | 12:09 | |
*** bhavikdbavishi has quit IRC | 12:09 | |
odyssey4me | clarkb mnaser AJaeger proposed the back ports to OSA's pike & queens branches: https://review.openstack.org/#/q/Ic966bafd04c4c01b3d93851a0e3ec2c1f3312f28 | 12:13 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Remove world writable umask from /src folder https://review.openstack.org/625576 | 12:14 |
*** jpena is now known as jpena|lunch | 12:30 | |
*** janki has quit IRC | 12:31 | |
*** bobh has joined #openstack-infra | 12:34 | |
*** bobh has quit IRC | 12:38 | |
*** udesale has joined #openstack-infra | 12:39 | |
*** e0ne has joined #openstack-infra | 12:47 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Delay Github fileschanges workaround to pipeline processing https://review.openstack.org/625584 | 12:50 |
*** lpetrut has quit IRC | 12:51 | |
*** bobh has joined #openstack-infra | 12:53 | |
*** rlandy has joined #openstack-infra | 12:57 | |
*** janki has joined #openstack-infra | 12:59 | |
*** markvoelker has quit IRC | 13:05 | |
*** boden has joined #openstack-infra | 13:11 | |
*** e0ne has quit IRC | 13:11 | |
*** rpittau|lunch is now known as rpittau | 13:11 | |
*** boden has quit IRC | 13:12 | |
*** bhavikdbavishi has joined #openstack-infra | 13:13 | |
*** e0ne has joined #openstack-infra | 13:14 | |
*** boden has joined #openstack-infra | 13:14 | |
*** Bhujay has joined #openstack-infra | 13:15 | |
*** bobh has quit IRC | 13:16 | |
*** jamesmcarthur has joined #openstack-infra | 13:20 | |
frickler | mgoddard: I don't think that it should be difficult, just tedious. would need dedicated dns records per mirror and an appropriate vhost set up | 13:20 |
frickler | infra-root: FYI this ^^ seems to be what is breaking zuul-quick-start jobs, too. seeing the same error on the node help for this http://logs.openstack.org/55/624855/3/check/zuul-quick-start/00d956c/job-output.txt.gz#_2018-12-17_12_36_58_996620 http://paste.openstack.org/show/737483/ | 13:22 |
*** dave-mccowan has joined #openstack-infra | 13:23 | |
*** smarcet has joined #openstack-infra | 13:23 | |
*** alexchadin has quit IRC | 13:24 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Delay Github fileschanges workaround to pipeline processing https://review.openstack.org/625584 | 13:26 |
openstackgerrit | Merged openstack-infra/system-config master: Stop running unnecessary tests on trusty https://review.openstack.org/625358 | 13:26 |
*** yamamoto has quit IRC | 13:27 | |
*** jpena|lunch is now known as jpena | 13:27 | |
frickler | mgoddard: FYI I added that topic for tomorrow's infra meeting, maybe you want to join us or read up on it afterwards | 13:29 |
*** yamamoto has joined #openstack-infra | 13:32 | |
*** jamesmcarthur has quit IRC | 13:32 | |
*** Bhujay has quit IRC | 13:33 | |
*** rh-jelabarre has joined #openstack-infra | 13:33 | |
frickler | hmm, actually we are using dedicated ports already. so maybe we can just drop the path if we update our mirror config. http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/install-docker/tasks/mirror.yaml | 13:35 |
frickler | there even is a comment about it here, so IIUC it should be possible to simply switch from :8081/path to :8082/ http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/mirror.vhost.erb#n387 | 13:39 |
*** e0ne has quit IRC | 13:42 | |
*** dave-mccowan has quit IRC | 13:42 | |
*** gagehugo has joined #openstack-infra | 13:42 | |
*** kgiusti has joined #openstack-infra | 13:42 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/zuul-jobs master: Alwas use pathless docker mirror URI https://review.openstack.org/625596 | 13:43 |
frickler | infra-root: ^^ I'm probably missing something here, but this might be a simple solution | 13:44 |
*** e0ne has joined #openstack-infra | 13:45 | |
*** dkehn has joined #openstack-infra | 13:45 | |
*** jamesmcarthur has joined #openstack-infra | 13:47 | |
*** bhavikdbavishi has quit IRC | 13:49 | |
*** quiquell is now known as quiquell|lunch | 13:49 | |
*** pcaruana has quit IRC | 13:50 | |
openstackgerrit | Merged openstack/os-testr master: Updated from global requirements https://review.openstack.org/533993 | 13:51 |
fungi | frickler: mgoddard: or a dedicated tcp port? | 13:54 |
fungi | oh, i see that's in fact what we've already set up! ;) | 13:56 |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/nodepool master: Switch devstack jobs to Xenial https://review.openstack.org/624855 | 13:57 |
*** lbragstad has joined #openstack-infra | 13:58 | |
*** lbragstad has quit IRC | 13:59 | |
mgoddard | frickler: thanks for looking into this. So it looks like we can use 8082 without a path already? | 13:59 |
*** adriancz has joined #openstack-infra | 13:59 | |
fungi | yes, that seems to be why we added the 8082 variant | 13:59 |
frickler | mgoddard: at least that's how I'm interpreting the situation currently. the above patch is testing this now | 14:00 |
frickler | hmm, though maybe a depends-on won't work there properly | 14:00 |
mgoddard | frickler: it looks like SamYaple hit this issue based on the comment in that file, so presumably it working for him | 14:00 |
frickler | mgoddard: yeah, but it still may vary depending on the exact docker version in use. but I think you could give it a try if you want, the mirror setup should be working already | 14:01 |
mgoddard | frickler: this looks promising: http://git.openstack.org/cgit/openstack/airship-maas/tree/tools/gate/playbooks/vars.yaml | 14:02 |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 14:02 |
mgoddard | frickler: I'll give it a shot. Thanks for the help | 14:02 |
*** lbragstad has joined #openstack-infra | 14:04 | |
*** pcaruana has joined #openstack-infra | 14:05 | |
*** dave-mccowan has joined #openstack-infra | 14:06 | |
*** smarcet has quit IRC | 14:06 | |
*** mriedem has joined #openstack-infra | 14:08 | |
*** bobh has joined #openstack-infra | 14:09 | |
frickler | fwiw, this whole thing seems to have been triggered by backporting a recent version of docker.io into xenial-updates last thursday https://launchpad.net/ubuntu/+source/docker.io | 14:09 |
*** nhicher has joined #openstack-infra | 14:11 | |
*** dave-mccowan has quit IRC | 14:11 | |
*** bobh has quit IRC | 14:12 | |
*** bobh has joined #openstack-infra | 14:13 | |
*** jamesmcarthur has quit IRC | 14:13 | |
*** smarcet has joined #openstack-infra | 14:14 | |
*** jamesmcarthur has joined #openstack-infra | 14:14 | |
*** quiquell|lunch has quit IRC | 14:15 | |
*** quiquell has joined #openstack-infra | 14:15 | |
dulek | Folks, I'm seeing "kuryr-daemon 2609G" in dstat's top-mem column. | 14:15 |
dulek | Example here: http://logs.openstack.org/27/625327/3/check/kuryr-kubernetes-tempest-daemon-openshift-octavia/cb22439/controller/logs/screen-dstat.txt.gz, around 10:22 | 14:16 |
fungi | woo! kuryr likes it some memory i suppose? | 14:16 |
fungi | at least that's a kuryr-specific job, so it's presumably not impacting more general job configurations | 14:16 |
*** ginopc has quit IRC | 14:16 | |
*** psachin has quit IRC | 14:17 | |
*** ginopc has joined #openstack-infra | 14:17 | |
dulek | How possible that it's some dstat quirk? I strongly doubt that my process allocates more than 2 TB of memory without OOM stepping in. | 14:17 |
frickler | dulek: just allocating memory should be no issue as long as it isn't actually used | 14:19 |
openstackgerrit | Merged openstack/ptgbot master: Handle all schedule in a single table https://review.openstack.org/607307 | 14:20 |
dulek | frickler: You mean problem for OOM, in general I guess it's not a good move to allocate 2 TB's of RAM. ;) | 14:21 |
dulek | frickler: It drains 3 GB of swap, but yeah, looks like it stops there. | 14:21 |
*** graphene has joined #openstack-infra | 14:22 | |
openstackgerrit | Merged openstack/ptgbot master: Split up function colorizing non-colored tracks https://review.openstack.org/620036 | 14:22 |
openstackgerrit | Merged openstack/ptgbot master: Load base schedule dynamically https://review.openstack.org/607308 | 14:22 |
openstackgerrit | Merged openstack/ptgbot master: Rename ~reload to ~emptydb https://review.openstack.org/620037 | 14:24 |
openstackgerrit | Merged openstack/ptgbot master: Make 'unbook' available for all https://review.openstack.org/620043 | 14:27 |
openstackgerrit | Merged openstack/ptgbot master: Add emergency messages (~motd and ~cleanmotd) https://review.openstack.org/620047 | 14:27 |
openstackgerrit | Merged openstack/ptgbot master: Give better hints in case of command errors https://review.openstack.org/620059 | 14:27 |
*** yboaron has joined #openstack-infra | 14:27 | |
*** yboaron_ has quit IRC | 14:27 | |
frickler | fungi: https://review.openstack.org/625596 passed and fixed the zuul-quick-start job for https://review.openstack.org/624855 , which in turn is needed to fix nodepool and unblock other things, if you have time for a review yet. other infra-root, too ;) | 14:27 |
openstackgerrit | Merged openstack/ptgbot master: Allow unscheduled tracks to use now/next https://review.openstack.org/620066 | 14:28 |
fungi | frickler: thanks! | 14:28 |
openstackgerrit | Witold Bedyk proposed openstack-infra/irc-meetings master: Add second time for Monasca team meeting https://review.openstack.org/625609 | 14:29 |
*** kiennt26 has joined #openstack-infra | 14:30 | |
fungi | frickler: that's in zuul-jobs and looks like a potential behavior change for downstream consumers. merits discussing in #zuul or on the zuul-discuss ml at a minimum | 14:31 |
*** bobh has quit IRC | 14:31 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Delay Github fileschanges workaround to pipeline processing https://review.openstack.org/625584 | 14:32 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Remove world writable umask from /src folder https://review.openstack.org/625576 | 14:33 |
*** yboaron_ has joined #openstack-infra | 14:36 | |
*** yamamoto has quit IRC | 14:38 | |
*** yboaron has quit IRC | 14:39 | |
*** yamamoto has joined #openstack-infra | 14:39 | |
*** graphene has quit IRC | 14:42 | |
*** graphene has joined #openstack-infra | 14:44 | |
*** yamamoto has quit IRC | 14:45 | |
*** yamamoto has joined #openstack-infra | 14:47 | |
*** bobh has joined #openstack-infra | 14:48 | |
*** bobh has quit IRC | 14:52 | |
*** yamamoto has quit IRC | 14:53 | |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Add docker mirror url entries to site variables https://review.openstack.org/625615 | 14:55 |
*** kiennt26 has left #openstack-infra | 14:55 | |
mordred | frickler, fungi, tobiash: ^^ that as a step one | 14:55 |
*** toabctl has quit IRC | 14:57 | |
*** calbers_ has quit IRC | 14:57 | |
*** toabctl has joined #openstack-infra | 14:57 | |
*** calbers has joined #openstack-infra | 14:58 | |
dhellmann | gerrit-admin: when you have a moment, could someone please add me to the osc-summit-counter-core and osc-summit-counter-release groups? https://review.openstack.org/#/admin/groups/1991,members and https://review.openstack.org/#/admin/groups/1992,members | 14:58 |
fungi | dhellmann: done | 15:00 |
dhellmann | fungi : thanks! | 15:00 |
*** beekneemech is now known as bnemec | 15:00 | |
fungi | any time! | 15:00 |
*** e0ne has quit IRC | 15:02 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Update install-docker to use docker site variable https://review.openstack.org/625617 | 15:03 |
*** bobh has joined #openstack-infra | 15:03 | |
mordred | EmilienM|off, ssbarnea|rover: so you know if tripleo is using the install-docker role? and if so, are you still using an older docker that needs the old-style docker mirror? | 15:05 |
*** e0ne has joined #openstack-infra | 15:06 | |
*** efried has joined #openstack-infra | 15:06 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Change Heat meeting schedule https://review.openstack.org/625493 | 15:07 |
*** jamesmcarthur has quit IRC | 15:09 | |
*** jamesmcarthur has joined #openstack-infra | 15:10 | |
*** yboaron_ has quit IRC | 15:10 | |
openstackgerrit | Merged openstack-infra/opendev-website master: Add .zuul.yaml https://review.openstack.org/624139 | 15:11 |
*** jamesmcarthur has quit IRC | 15:15 | |
frickler | mordred: fyi, codesearching for "registry-mirrors" also shows kayobe and various airship repos | 15:15 |
*** dpawlik has quit IRC | 15:17 | |
*** dpawlik has joined #openstack-infra | 15:18 | |
*** dpawlik has quit IRC | 15:18 | |
openstackgerrit | Merged openstack-infra/opendev-website master: Add some initial content thoughtso https://review.openstack.org/622624 | 15:18 |
openstackgerrit | Merged openstack-infra/opendev-website master: Convert initial content to html for publication https://review.openstack.org/624149 | 15:18 |
openstackgerrit | Thierry Carrez proposed openstack-infra/puppet-ptgbot master: No longer needs room map in configuration https://review.openstack.org/625619 | 15:19 |
*** derekh has quit IRC | 15:24 | |
ssbarnea|rover | rlandy: one side-effect of running reproducer: https://review.openstack.org/#/c/625621/ | 15:24 |
*** chandankumar is now known as chkumar|out | 15:26 | |
*** ykarel is now known as ykarel|away | 15:26 | |
*** yamamoto has joined #openstack-infra | 15:27 | |
*** derekh has joined #openstack-infra | 15:27 | |
*** pcaruana has quit IRC | 15:29 | |
openstackgerrit | Merged openstack-infra/project-config master: Add github dogpile.cache to project list https://review.openstack.org/625467 | 15:31 |
corvus | fungi: i didn't see a review from you on 625596 -- do you want to hold that change or do you think it's okay to merge (potentially being too openstack-specific for wider use at the moment)? | 15:33 |
corvus | (i'm trying to catch up and don't have everything paged in) | 15:34 |
fungi | i was mostly just making sure the discussion in #zuul had played out first | 15:35 |
*** ykarel|away has quit IRC | 15:35 | |
evrardjp | mordred: it would be easier (by far!) if you have an explicit version of docker you're thinking about | 15:35 |
mordred | evrardjp: it seems that docker 1.6 is where support for v2 registries happened, and that's in Trusty but the other things seem to have newer | 15:37 |
mordred | evrardjp: so I'm starting to think that we don't really need to support the v1 registry format anymore | 15:37 |
mordred | corvus: I've got 2 followup changes up to potentially clean up the non-openstack portions | 15:38 |
corvus | fungi, mordred: okay my read of #zuul is we should merge 596 now, and merge the frickler/tobiash/mordred stuff as soon as it's ready (which might be in a few mins) | 15:38 |
amorin | hey all | 15:38 |
fungi | corvus: i concur and have +2'd it | 15:38 |
mordred | corvus: biggest outstanding question is whether we want to attempt to support the old style v1 registry (other than by url overrides) - which I'm now leaning towards "no" | 15:38 |
mordred | corvus: yes | 15:38 |
*** andreww has joined #openstack-infra | 15:39 | |
*** jamesmcarthur has joined #openstack-infra | 15:40 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/project-config master: add release job for osc-summit-counter https://review.openstack.org/625627 | 15:41 |
EmilienM|off | mordred: we don't use this role afik | 15:41 |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests https://review.openstack.org/625457 | 15:42 |
EmilienM|off | But I'll check when back on computer | 15:42 |
*** xarses_ has quit IRC | 15:42 | |
smarcet | fungi: morning could we retrigger https://review.openstack.org/#/c/611936/ ? its failing on legacy units tests | 15:48 |
evrardjp | mordred: oh yeah, I am using v2 by default in all I write | 15:48 |
mgoddard | frickler: just to feed back, using port 8082 worked a charm. Thanks | 15:50 |
fungi | smarcet: wow, that jobs so broken it doesn't even make it far enough to generate a console log? | 15:50 |
smarcet | fungi: yeah its seems so | 15:50 |
smarcet | fungi: not sure whats going on there | 15:50 |
mordred | EmilienM|off: cool. I'm guessing you're using docker > 1.6 at this point too yeah? | 15:51 |
*** pcaruana has joined #openstack-infra | 15:51 | |
corvus | smarcet, fungi: best thing to do is retrigger it and then watch the log in the web browser while it's running | 15:52 |
tobiash | fungi: I guess the logs of that job just were deleted given the job run was 6 weeks ago (the successful job also has no logs) | 15:52 |
fungi | smarcet: oh! i see, the job ran 6 weeks ago, so we've already expired those job logs | 15:52 |
corvus | oh :) that | 15:52 |
*** ykarel|away has joined #openstack-infra | 15:52 | |
fungi | smarcet: any review comment starting with the word "recheck" and no accompanying vote will do that, i've added one now | 15:53 |
smarcet | ok will note that | 15:53 |
smarcet | thx u :) | 15:53 |
*** slaweq has quit IRC | 15:54 | |
*** armax has joined #openstack-infra | 15:55 | |
openstackgerrit | sebastian marcet proposed openstack-infra/system-config master: Migrate OpenStackID dev server to php7 https://review.openstack.org/625640 | 15:55 |
*** udesale has quit IRC | 15:57 | |
clarkb | amorin: hello | 15:59 |
*** dklyle has joined #openstack-infra | 15:59 | |
*** slaweq has joined #openstack-infra | 16:01 | |
clarkb | infra-root https://review.openstack.org/#/c/625350/ fixes our ansible base server application | 16:04 |
mordred | EmilienM|off: fwiw - it looks like tripleo is still consuming the v1 proxy (which is fine) | 16:05 |
dmsimard | clarkb: got someone to look at https://github.com/ansible/ansible/issues/49969 | 16:05 |
*** janki has quit IRC | 16:05 | |
clarkb | dmsimard: ya I think they found the reason but odd an unhandled exception wouldnt result in failure | 16:05 |
clarkb | could be acouple things that need fixing in ansible | 16:06 |
*** quiquell is now known as quiquell|off | 16:06 | |
mordred | clarkb: maybe, since v2 support seems like a thing for all the docker versions people are using, we should get people transitioned from the :8081 mirror to the :8082 mirror | 16:07 |
clarkb | mordred: if tou look at apache usage its 99% v1 due to tripleo andnothing uses v2 | 16:07 |
mordred | clarkb: there don't seem to be many places it's used: http://codesearch.openstack.org/?q=8081%2Fregistry&i=nope&files=&repos= | 16:07 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Alwas use pathless docker mirror URI https://review.openstack.org/625596 | 16:07 |
*** aojea has quit IRC | 16:07 | |
clarkb | I dont think we care if people use one or the other since itseasy enough to run both? | 16:07 |
mordred | clarkb: actually, I just looked at the 8081 logs and it does seem to be tripleo but it seems to be using v2 now | 16:07 |
mordred | 104.130.139.141 - - [17/Dec/2018:16:03:19 +0000] "GET /cloudflare/registry-v2/docker/registry/v2/blobs/sha256/ec/ecdfdc556e7deea5f905380baaa27c9770625870837e3bfc73e06c353644ab56/data?verify=1545065599-iSYZzie9ooFj%2Bl6t9bPFhVEEd3c%3D HTTP/1.1" 200 72000541 cache hit | 16:08 |
mordred | "http://mirror.dfw.rax.openstack.org:8081/registry-1.docker/v2/tripleomaster/centos-binary-neutron-l3-agent/blobs/sha256:ecdfdc556e7deea5f905380baaa27c9770625870837e3bfc73e06c353644ab56" "docker/1.13.1 go/go1.9.4 kernel/3.10.0-957.1.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(docker-sdk-python/3.5.0)" | 16:08 |
clarkb | v2 through the v1 proxy? neat | 16:09 |
clarkb | Im guessing thats achieved by not using docker itself | 16:09 |
*** gyee has joined #openstack-infra | 16:10 | |
mordred | according to the docs, the latest version of registry is supposed to support both and is supposed to support them seamlessly | 16:10 |
mordred | clarkb: so maybe the thing that caused us to run two was a bug and has since been fixed? | 16:10 |
clarkb | aiui it was the client side refusing to accept path'd mirrors | 16:11 |
*** graphene has quit IRC | 16:11 | |
clarkb | I think it is/was purely a client issue | 16:11 |
mordred | nod. well - I think we've hit a point where the version of docker everyone is using can handle the v2 mirror | 16:11 |
*** ykarel|away is now known as ykarel | 16:12 | |
clarkb | and with tripleo using podman etc thereis good chance their client tooling isnt so restrictive | 16:12 |
*** graphene has joined #openstack-infra | 16:12 | |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Add check queue labels for relative-priority https://review.openstack.org/625645 | 16:15 |
corvus | clarkb, AJaeger, mordred: ^ | 16:15 |
*** Bhujay has joined #openstack-infra | 16:16 | |
clarkb | corvus: thanks. | 16:17 |
clarkb | it would be much appreciated if we can fix our ansible runs via https://review.openstack.org/#/c/625350/ or something similar before we all disappear for holidays too. This ended up being an ansible bug which I reproduced and filed upstream (link in the comments for that change) | 16:18 |
*** ccamacho has quit IRC | 16:19 | |
*** jamesmcarthur has quit IRC | 16:21 | |
*** ccamacho has joined #openstack-infra | 16:21 | |
dulek | fungi, frickler: Looks like that kuryr's "appetite" for memory is just a dstat's bug. | 16:23 |
mordred | clarkb: lgtm | 16:23 |
mordred | clarkb: yeah - podman works great with the v2 mirror (I've already tried) | 16:23 |
dulek | fungi, frickler: I think it's splitting the /proc/<pid>/stat in a wrong way. | 16:24 |
*** Emine has quit IRC | 16:24 | |
mordred | clarkb: I have been meaning to write an install-podman role that will similarly setup the registry mirror | 16:24 |
mordred | clarkb: incidentally, this list: | 16:26 |
mordred | clarkb: registries = ['docker.io', 'registry.fedoraproject.org', 'quay.io', 'registry.access.redhat.com', 'registry.centos.org'] | 16:26 |
dulek | fungi, frickler: Yup, for kuryr it's multiplying virt-mem * PAGE_SIZE instead of rss. | 16:26 |
fungi | interesting | 16:26 |
mordred | clarkb: is what gets installed into registries.conf by default for fedora | 16:26 |
mrhillsman | following up on a discussion on the openci ml re opendev; are the svcs - in particular ml and git - available right now with custom domain? | 16:27 |
mordred | mrhillsman: ml yes | 16:27 |
fungi | mrhillsman: the plan under discussion for git is to deprecate the custom domanis and use git.opendev.org | 16:27 |
fungi | er, domains | 16:27 |
mordred | mrhillsman: git is currently avail with custom domain- but once the git farm is branded as opendev.org - the plan is to drop custom git domains | 16:27 |
corvus | fungi, mrhillsman: or perhaps even just 'opendev.org' | 16:28 |
corvus | (no 'git.') | 16:28 |
*** pcaruana has quit IRC | 16:28 | |
fungi | oh, right, that was also an option | 16:28 |
mordred | yeah. opendev.org/openstack/nova should work just fine | 16:28 |
mordred | mrhillsman: so we'd be able to offer opendev.org/openci/foo - for instance | 16:28 |
mrhillsman | so no custom domains? | 16:28 |
corvus | mrhillsman: https://review.openstack.org/623033 is the plan under discussion | 16:28 |
mrhillsman | across anything? | 16:28 |
fungi | mrhillsman: for mailman mailing lists and web content we're likely to continue supporting custom domains going forward | 16:29 |
mrhillsman | will the underlying things be available for someone to host to resolve that use case? | 16:29 |
mrhillsman | i guess like how - cannot remember the name right now - the red hat zuul installer | 16:30 |
fungi | so for example the https://opendev.org/openci/openci-website git repo could publish https://openci.org/ content hosted from the files.opendev.org server... something like that | 16:30 |
mrhillsman | if someone wants to use a custom domain to tie into opendev or flat out install everything in a different location | 16:30 |
fungi | mrhillsman: windmill? | 16:31 |
mrhillsman | nah, brain fart, sec | 16:31 |
mrhillsman | i think tristan does the updates for it | 16:31 |
mordred | mrhillsman: software factory | 16:31 |
mrhillsman | yeah | 16:31 |
mordred | mrhillsman: I think right now the thinking is that with a neutral base domain, it's simiar to github.com or gitlab.com - and people donm't seem to mind that for their git repo hosting, so the complexity of whitelabeling the git domains could be avoided | 16:32 |
clarkb | mordred: re registries yes that is how you specify them now without paths or as a URI. Its jus a hostname then host is expected to speak the right protocol at the right url for it | 16:32 |
openstackgerrit | Merged openstack-infra/puppet-mediawiki master: Optionally alias to a favicon.ico file if provided https://review.openstack.org/439082 | 16:33 |
mordred | mrhillsman: however, it's an in-discussion topic, so if there is a strong use case for whitelabeled git domains, now would definitely be the time to talk it through | 16:33 |
clarkb | and with the docker client tooling there is no way to say its foo.com/over/here iirc | 16:33 |
smarcet | fungi: where i could get the source code for legacy-laravel-openstackid-unittests ? i thinkg that its failing bc its trying to use php5 and we need to update to php7.2 | 16:34 |
mordred | clarkb: yah - I was mainly pointing out that rh is starting to ship things that expect to be able to talk to a set of registries not just dockerhub, so we might want to ponder adding those others to mirroring infrastructure so that it's possible plop down a similar file that points to mirrors for all of that content | 16:34 |
clarkb | mordred: ugh | 16:34 |
clarkb | mordred: this is the nodejs don't actually host your packages (pypi before it too) problem all over again | 16:34 |
mordred | clarkb: more of a heads-up than a needs-action at the moment | 16:34 |
mordred | clarkb: actually- in newer tooling you can specify registry as part of container name | 16:35 |
mrhillsman | ok, so to get the ml going what is needed? | 16:35 |
clarkb | mordred: oh tahts good | 16:35 |
mordred | mrhillsman: ml going is just a patch to system-config and making sure dns is set up properly. importing existing archives might be a little more work - I'll defer to fungi on that | 16:36 |
openstackgerrit | Merged openstack-infra/system-config master: Manage the favicon.ico file for the wiki https://review.openstack.org/439083 | 16:36 |
mrhillsman | ok cool | 16:36 |
mordred | mrhillsman: I'm happy to help with the project-config patch if that's the direction you decide you'd like to go | 16:36 |
corvus | mrhillsman: example change to add a new list domain: https://review.openstack.org/569545 | 16:37 |
fungi | mrhillsman: for lists.opendev.org i split it over a couple of changes: https://review.openstack.org/625096 sets up the domain, then https://review.openstack.org/625254 to add a specific ml in it | 16:37 |
mrhillsman | i think everyone according to the discussions on the ml and in meetings over the past few months are ok with the change | 16:37 |
*** bhavikdbavishi has joined #openstack-infra | 16:38 | |
clarkb | note that https://review.openstack.org/#/c/625350/ is necessary to get in first to have that config apply correctly | 16:38 |
mrhillsman | oh wait, fungi, so can there be lists.openci.io or has to be lists.opendev.org; i could have misunderstood | 16:38 |
mrhillsman | git single place, ml custom domain if you want? | 16:39 |
*** aojea has joined #openstack-infra | 16:39 | |
fungi | mrhillsman: we have the ability to do either. what we haven't discussed yet is how we decide what domains we're willing to host white-labeled services for | 16:39 |
fungi | at the ptg we said we'd at least do it for osf projects and pilot projects, but we didn't rule out supporting domains for other projects beyond that | 16:39 |
mordred | yeah - and openci is the type of project I'd imagine being ok supporting custom domains for - if doing that is in-game | 16:40 |
*** _alastor_ has joined #openstack-infra | 16:40 | |
mrhillsman | ok. is that discussion a priority; has an expected decision date? | 16:40 |
mrhillsman | just to get an idea | 16:41 |
fungi | i think it's a "discussion" i'd be comfortable having via review comments on a simple system-config change in gerrit | 16:41 |
mrhillsman | ++ | 16:41 |
fungi | but i don't know how any other stakeholders feel aboutit | 16:41 |
mrhillsman | understood | 16:41 |
clarkb | ya related to that I think its in the class of thing we probably will end up tackling once it becomes something someone wants to do | 16:41 |
clarkb | rather than try and write down all the rules before hand | 16:41 |
fungi | this is a nice opportunity to force the conversation to happen | 16:42 |
mrhillsman | i'll push a patch | 16:42 |
clarkb | (the concrete use cases likely help form better opiniosn too rather than guessing at use cases) | 16:42 |
mrhillsman | ++ | 16:42 |
openstackgerrit | Thierry Carrez proposed openstack-infra/irc-meetings master: Update count_slot_usage for new recurrences https://review.openstack.org/625656 | 16:42 |
openstackgerrit | Thierry Carrez proposed openstack-infra/irc-meetings master: Add count_slot_usage argument for sensitivity https://review.openstack.org/625657 | 16:42 |
*** jamesmcarthur has joined #openstack-infra | 16:42 | |
fungi | mrhillsman: also i'm happy to help with importing your archives if/once the list gets created on our server | 16:43 |
mrhillsman | ty sir | 16:43 |
*** Emine has joined #openstack-infra | 16:48 | |
*** aojea has quit IRC | 16:51 | |
*** jamesmcarthur has quit IRC | 16:54 | |
openstackgerrit | Merged openstack-infra/project-config master: Add check queue labels for relative-priority https://review.openstack.org/625645 | 16:54 |
*** jamesmcarthur has joined #openstack-infra | 16:55 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid master: Migration to PHP 7.x https://review.openstack.org/611936 | 16:55 |
*** pgaxatte has quit IRC | 16:56 | |
*** Bhujay has quit IRC | 16:56 | |
*** graphene has quit IRC | 16:58 | |
*** graphene has joined #openstack-infra | 17:00 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests https://review.openstack.org/625457 | 17:00 |
openstackgerrit | Clark Boylan proposed openstack-infra/opendev-website master: Add publishing of content to opendev-website https://review.openstack.org/625665 | 17:01 |
clarkb | super simple start at publishing opendev website content. Starting with the logs server as I haven't dug into how the afs stuff works yet | 17:01 |
*** efried has quit IRC | 17:01 | |
openstackgerrit | Merged openstack-infra/system-config master: Update favicon for newer OpenStack logo https://review.openstack.org/439045 | 17:01 |
*** graphene has quit IRC | 17:04 | |
*** graphene has joined #openstack-infra | 17:06 | |
*** fuentess has joined #openstack-infra | 17:07 | |
openstackgerrit | James E. Blair proposed openstack-infra/infra-specs master: Add opendev Gerrit spec https://review.openstack.org/623033 | 17:08 |
corvus | clarkb: ^ that should be ready for voting. | 17:10 |
frickler | mordred: do you want to approve https://review.openstack.org/624855 to fix nodepool to unblock sdk? | 17:10 |
corvus | looks like it's already on the agenda | 17:10 |
*** ramishra has quit IRC | 17:12 | |
smarcet | fungi: i am seeing the error now http://logs.openstack.org/36/611936/7/check/legacy-laravel-openstackid-unittests/c3619f5/job-output.txt.gz, saids that not matching package php7.0 is avaiable on dist, but its does exists on xenial | 17:15 |
frickler | 1 | 17:17 |
mordred | frickler: the -src jobs are still failing with that patch | 17:17 |
fungi | smarcet: http://logs.openstack.org/36/611936/7/check/legacy-laravel-openstackid-unittests/c3619f5/zuul-info/inventory.yaml indicates that job ran on an ubuntu-trusty node | 17:17 |
mordred | fungi: although that explains why the sdk job was failing with zypper issues | 17:18 |
smarcet | fungi: ok where i could change that to run on xenial ? | 17:18 |
frickler | mordred: yes, but we need to start somewhere, then fix the dogpile.cache issue, then onwards. step by step, I'd say | 17:21 |
openstackgerrit | Merged openstack-infra/zuul master: Use combined status for Github status checks https://review.openstack.org/623417 | 17:21 |
mordred | frickler: ah - ok. actually - lemme look at something real quick ... | 17:21 |
openstackgerrit | Clark Boylan proposed openstack-infra/opendev-website master: Publish opendev website to afs on merge. https://review.openstack.org/625671 | 17:21 |
clarkb | and that I think should work for publishing to afs | 17:21 |
*** derekh has quit IRC | 17:21 | |
*** jtomasek has quit IRC | 17:21 | |
fungi | smarcet: seems it's set at https://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/zuul-legacy-jobs.yaml#n433 but we should be looking at moving that job definition into the openstackid repo and updating it to be zuulv3-native per https://docs.openstack.org/infra/manual/zuulv3.html | 17:22 |
mordred | frickler: ok. it does seem like we are at least appropriately installing openstacksdk from source (just wanted ot make sure) | 17:22 |
fungi | that job was merely auto-converted from the old jenkins job definitions | 17:22 |
clarkb | egonzalez: I've noticed that quite a few hits for http://status.openstack.org/elastic-recheck/index.html#1708704 are for kolla jobs that don't appear to be using our local in cloud region mirrors. Is that something we can help you/kolla clean up? | 17:22 |
frickler | mordred: although this error looks like something new now http://logs.openstack.org/55/624855/4/check/nodepool-functional-py35/32ff9b2/job-output.txt.gz#_2018-12-17_15_16_36_770331 | 17:23 |
*** jpich has quit IRC | 17:23 | |
*** graphene has quit IRC | 17:24 | |
smarcet | fungi: ok got it | 17:24 |
smarcet | thx u | 17:24 |
*** graphene has joined #openstack-infra | 17:25 | |
*** trown is now known as trown|lunch | 17:25 | |
egonzalez | clarkb patch is being merged, was because a package was removed from rpm repos, so image build were failing | 17:25 |
*** eernst has joined #openstack-infra | 17:27 | |
*** eernst has quit IRC | 17:27 | |
*** eernst has joined #openstack-infra | 17:28 | |
clarkb | egonzalez: have a link to that fix? | 17:29 |
clarkb | egonzalez: looks like opendaylight ? | 17:30 |
egonzalez | clarkb https://review.openstack.org/#/c/623426/ | 17:31 |
egonzalez | vitrage | 17:31 |
clarkb | egonzalez: ok, looking at the logstash hits for that bug on the kolla jobs it seems related to opendaylight somehow (maybe that is teh flaky repo?) | 17:32 |
*** eernst has quit IRC | 17:32 | |
egonzalez | hrm, we have an odl repo you maybe not have mirrored | 17:33 |
clarkb | mriedem: Do we want to send one last CI status update before we all disappear? We've aggregated repos by logical group/queue for prioritzing node assignments, we've fixed OVS installation repo setup for multinode bootstrapping on Centos. Thats the infra stuff. QA/Devstack have updated cirros image to version 0.3.6 from 0.3.5 and merged dansmiths direct-io change. I think nova has done some stuff too | 17:33 |
egonzalez | clarkb baseurl=https://cbs.centos.org/repos/nfv7-opendaylight-6-release/x86_64/os but moving to 9th release now | 17:34 |
clarkb | egonzalez: ya, we've found centos.org is quite flaky :( | 17:35 |
clarkb | so we may want to add a mirror/cache for that repo if we don't already have it | 17:35 |
*** smarcet has quit IRC | 17:36 | |
egonzalez | probably, given the odl packages are quite big >300mb | 17:36 |
clarkb | http://mirror.dfw.rax.openstack.org/centos/7/ is what we mirror from centos.org. Which is a different repo than the opendaylight repo. Let me see if we proxy cache it | 17:37 |
*** bhavikdbavishi has quit IRC | 17:37 | |
*** ykarel is now known as ykarel|away | 17:37 | |
fungi | 300mb? wow, no wonder they don't carry it in the distro | 17:37 |
fungi | that's pretty massive for a distro package | 17:38 |
clarkb | egonzalez: we proxy cache https://nexus.opendaylight.org/ at http://$mirror_node:8080/opendaylight | 17:38 |
mriedem | clarkb: sure if you want, i'd just ask you don't tag it with [infra] since then it filters for me to a folder i don't normally read :) | 17:38 |
clarkb | not sure if that location also hosts the same package repos | 17:38 |
mriedem | clarkb: i haven't assessed where we are with the nova-specific gate stuff yet, trying to focus on some other things first this week before i'm out next week | 17:38 |
clarkb | mriedem: no worries. I think the general push towards "people please look at this stuff" has resulted in godo results all around | 17:39 |
jrosser | i have a mirror related question, we get this var in a job NODEPOOL_CEPH_MIRROR=http://mirror.mtl01.inap.openstack.org/ceph-deb-hammer which isn't totally helpful for getting at ceph != hammer..... is there something else i should be doing to construct the path to the ceph mirror? | 17:40 |
clarkb | egonzalez: I don't know the mapping between opendaylight release numbers and names but that location does have centos repos for opendaylight releases | 17:41 |
clarkb | jrosser: that is an unfortaunte side effect of how every ceph release is a different repository | 17:41 |
clarkb | jrosser: I think we picked the one to advertise based on whatever devstack + nova + cinder are/were testing at the time | 17:42 |
clarkb | jrosser: but if you browse eg http://mirror.dfw.rax.openstack.org/ you'll see we have different versions available | 17:42 |
jrosser | hmm ok | 17:43 |
openstackgerrit | Merged openstack-infra/nodepool master: Switch devstack jobs to Xenial https://review.openstack.org/624855 | 17:44 |
clarkb | jrosser: we can bump the globally advertised version too if we just do a quick check it won't explicitly break everyone and send a note to people that it is happening | 17:44 |
jrosser | how about a proxy for download.ceph.com? is that feasable | 17:45 |
jrosser | becasue it's hard to write code once that works in CI and in the outside world right now | 17:45 |
clarkb | jrosser: yes, however that won't solve the select different url problem as we don't do transparent proxies. (the issue there being we can't restrict usage of transparent proxies so rather than be bad internet citizens we reverse proxy specific backends) | 17:46 |
clarkb | also the mirrors we have there should be far more reliable than a caching proxy | 17:46 |
*** e0ne has quit IRC | 17:46 | |
clarkb | jrosser: the way we try to handle this generally is have job setup apply the appropriate mirror info then have the job test workload skip over it | 17:48 |
egonzalez | clarkb cannot find odl repos in here http://mirror.dfw.rax.openstack.org | 17:49 |
clarkb | egonzalez: ya I think the only thing is the proxy for nexus.opendaylight.org as noted above | 17:50 |
egonzalez | is there a list of the mirrors i can look at to point the ci jobs? | 17:50 |
clarkb | egonzalez: https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/mirror.vhost.erb is going to be most correct as it is the apache config for the reverse proxues | 17:51 |
egonzalez | clarkb thanks will take a look | 17:51 |
clarkb | navigating eg http://mirror.dfw.rax.openstack.org shows you all of the content we mirror out of AFS | 17:51 |
clarkb | then we reverse proxy cache along side that things that are less practical to mirror on AFS. At this point the AFS mirrors are largely for the main distro mirrors | 17:52 |
*** Emine has quit IRC | 17:54 | |
*** Emine has joined #openstack-infra | 17:54 | |
*** diablo_rojo has joined #openstack-infra | 17:54 | |
jrosser | clarkb: i'd have to think harder about it as there is xenial/bionic/centos vs which ceph release is packaged for those vs. which openstack release is being tested in any particular job - so bumping the release in just changes the shape of the problem | 17:54 |
jrosser | *bumping the release in NODEPOOL_CEPH_MIRROR.... | 17:55 |
*** smarcet has joined #openstack-infra | 17:55 | |
*** _alastor_ has quit IRC | 17:57 | |
*** tosky has quit IRC | 17:57 | |
*** Emine has quit IRC | 17:58 | |
*** _alastor_ has joined #openstack-infra | 17:59 | |
openstackgerrit | Salvador Fuentes Garcia proposed openstack-infra/openstack-zuul-jobs master: kata-containers: add /usr/sbin to the PATH https://review.openstack.org/625679 | 18:04 |
*** smarcet has quit IRC | 18:13 | |
*** jpena is now known as jpena|off | 18:14 | |
*** rpittau has quit IRC | 18:16 | |
clarkb | fungi: I think I tracked down one possible cause of those post failures | 18:17 |
clarkb | The error was: template error while templating string: no filter named \'bool\'. String: {% if zuul_site_upload_logs | default(true) | bool or (zuul_site_upload_logs == \'failure\' and not zuul_success | bool) %} | 18:17 |
clarkb | from the executor logs | 18:17 |
clarkb | I'm working on a fix | 18:17 |
fungi | oh fun | 18:17 |
clarkb | hrm here I thought it was going to be called boolean or similar but docs seem to imply bool is correct | 18:18 |
clarkb | git.openstack.org/openstack-infra/zuul-jobs/roles/upload-logs/tasks/main.yaml is the file it seems unhappy about | 18:19 |
*** jamesmcarthur has quit IRC | 18:19 | |
*** wolverineav has joined #openstack-infra | 18:21 | |
fungi | wrong parenthetical grouping there maybe? | 18:21 |
fungi | should it be (zuul_site_upload_logs == 'failure' and not zuul_success) | bool | 18:21 |
fungi | or is it just zuul_success that needs to be recast there? | 18:22 |
fungi | i guess the == operator already renders a bool? | 18:23 |
clarkb | ya maybe it can't find a valid filter called bool to convert a boolean to a boolean? | 18:23 |
clarkb | in which case it could be a grouping issue | 18:23 |
*** jamesmcarthur has joined #openstack-infra | 18:24 | |
*** imacdonn has quit IRC | 18:24 | |
clarkb | dmsimard: mordred ^ you probably know more about ansible jinja2 filters than we do | 18:24 |
corvus | clarkb: can you point me at what you're looking at? | 18:24 |
*** imacdonn has joined #openstack-infra | 18:24 | |
*** smarcet has joined #openstack-infra | 18:24 | |
clarkb | corvus: I'm looking at the output of `ssh ze07.openstack.org grep 5c245a7825554131aeaabf7f589cc28b /var/log/zuul/executor-debug.log` | 18:24 |
clarkb | which is the executor log for a job that failed POST_FAILURE on my ansible crash fix change | 18:25 |
corvus | clarkb: did we change something related to that recently? | 18:25 |
clarkb | not that I am aware of | 18:25 |
*** ykarel|away has quit IRC | 18:25 | |
fungi | could https://github.com/ansible/ansible/issues/31115 be related? | 18:25 |
clarkb | fungi: maybe? I don't think we override nay of the jinja filters | 18:28 |
fungi | k | 18:29 |
*** _alastor_ has quit IRC | 18:31 | |
fungi | seems odd it wouldn't find one of its builtin filters | 18:31 |
*** electrofelix has quit IRC | 18:32 | |
*** _alastor_ has joined #openstack-infra | 18:32 | |
*** dpawlik has joined #openstack-infra | 18:32 | |
clarkb | fungi: ya makes me think you may be on to something with the typing being important | 18:34 |
*** wolverineav has quit IRC | 18:34 | |
clarkb | if jinja2 cares about input types it may be trying to say no valid bool filter for that input type found? | 18:35 |
*** wolverineav has joined #openstack-infra | 18:35 | |
clarkb | looks like jinja2 doesn't ship bool, that must come from ansible | 18:36 |
*** jamesmcarthur has quit IRC | 18:37 | |
clarkb | the ansible code implies any type is valid as input though | 18:38 |
*** wolverineav has quit IRC | 18:38 | |
*** Vadmacs has joined #openstack-infra | 18:38 | |
*** wolverineav has joined #openstack-infra | 18:38 | |
corvus | it ran with "-e zuul_success=True" | 18:39 |
*** smarcet has quit IRC | 18:39 | |
corvus | zuul_site_upload_logs should not be defined currently | 18:40 |
fungi | looks like that line was last altered in https://review.openstack.org/611622 which merged 2018-10-19 so it's been that way a couple months now | 18:40 |
*** trown|lunch is now known as trown | 18:41 | |
corvus | clarkb, fungi: i have no idea what happened, but it doesn't seem normal. | 18:41 |
clarkb | lib/ansible/plugins/filter/core.py is where to_bool is defined | 18:41 |
*** e0ne has joined #openstack-infra | 18:42 | |
clarkb | and we don't seem to redefine that in zuul | 18:42 |
clarkb | zuul does define its own set of filters though | 18:42 |
clarkb | last updated in october as well | 18:43 |
corvus | clarkb, fungi: puppet performed an install_zuul around that time. | 18:44 |
clarkb | oh interesting | 18:44 |
corvus | Dec 17 17:38:25 ze07 puppet-user[21405]: (/Stage[main]/Zuul/File[/opt/graphitejs]/ensure) removed | 18:45 |
corvus | Dec 17 17:40:07 ze07 puppet-user[21405]: (/Stage[main]/Zuul/Exec[install_zuul]) Triggered 'refresh' from 1 events | 18:45 |
corvus | that means install_zuul happened between those 2 timestamps, right? the log error was at 17:39:40,098 | 18:45 |
clarkb | yes puppet logs after it is done performing a task iirc | 18:45 |
fungi | was there a removal/reinstallation of ansible then? | 18:46 |
*** e0ne has quit IRC | 18:46 | |
corvus | the version on the host is from december 13 | 18:46 |
corvus | so i wouldn't expect it... but maybe? | 18:46 |
fungi | just theorizing maybe jinja was looking for the filter plugin while pip was in the process of uninstalling/reinstalling ansible | 18:47 |
corvus | fungi: yeah, i think that theory best fits most of the facts; the only thing missing is a stronger suggestion that pip touched ansible itself | 18:48 |
*** mriedem has quit IRC | 18:48 | |
corvus | i don't know if we have those logs | 18:48 |
clarkb | 2.5.14 was released today | 18:48 |
clarkb | er | 18:48 |
clarkb | I'm bad at dates | 18:48 |
clarkb | on the 13t | 18:48 |
fungi | `stat /usr/local/lib/python3.5/dist-packages/ansible/plugins/filter/core.py` on ze07 says Modify: 2018-12-17 17:39:24.512241363 +0000 | 18:49 |
clarkb | ya ok you found that version already | 18:49 |
egonzalez | clarkb added a patch to use proxies for ODL and percona repos https://review.openstack.org/#/c/625688/, hopefully it fix the elastic recheck | 18:49 |
clarkb | egonzalez: great, thanks | 18:49 |
clarkb | egonzalez: hopefully makes your jobs more reliable :) | 18:49 |
egonzalez | yeah, we have a lot of timeouts on percona repos | 18:50 |
fungi | clarkb: corvus: so i think that tells us it did at least overwrite that file around the time we hit the error | 18:50 |
*** mriedem has joined #openstack-infra | 18:51 | |
corvus | fungi, clarkb: oh, ha -- i think it just never occurred to me that we would go for 6 days without merging a zuul change, but that has happened | 18:52 |
corvus | so there was no trigger to install the new ansible until now | 18:52 |
fungi | it _was_ a quiet week | 18:52 |
*** jamesmcarthur has joined #openstack-infra | 18:53 | |
clarkb | ah ok so it was the ansible upgrade, but needed zuul repo update to trigger the bits to pull that in | 18:53 |
fungi | so anyway, pretty sure we have evidence to at least say that upgrading ansible in-place while an executor is active can be detrimental to the running jobs ;) | 18:53 |
corvus | i think that ties up all the loose ends. i believe we knew this was a possibility, but figured ansible releases are infrequent enough that we just wouldn't worry too much when it happened. | 18:53 |
clarkb | makes sense (though annoying it has a race tere) | 18:53 |
corvus | long-term, we may be able to make this better with the multi-version work | 18:53 |
fungi | agreed, i'm not too worried about it, chalk this up to continuous deployment failure modes to be on the lookout for | 18:54 |
fungi | honestly, i'm amazed that we upgraded ansible across a dozen busy executors and only saw a handful of post_error results as fallout | 18:54 |
*** smarcet has joined #openstack-infra | 18:55 | |
fungi | clarkb: though as a result, i doubt this explains the post_failure results ssbarnea|rover noticed over the weekend | 18:55 |
clarkb | corvus: could also possibly be made more reliable if ansible didn't lazy load that stuff, but not sure that is desireable (may impact memory or startup time) | 18:55 |
clarkb | fungi: yup agreed | 18:55 |
ssbarnea|rover | fungi: clarkb: i think that I found the rootcause for the posttimeouts, originating with an ansible security fix: https://github.com/ansible/ansible/pull/42142/files | 18:58 |
*** shardy has quit IRC | 18:58 | |
ssbarnea|rover | if you look in out logs, you will find that almost all tripleo builds present such a warning, because, for some .... reason our /home/zuul/src is made world writable. | 18:59 |
fungi | oh, neat! | 18:59 |
ssbarnea|rover | i cannot blame ansible for that, is out fault of doing such an insane chmod | 19:00 |
ssbarnea|rover | triple contains some ssh arguments that are supposed to prevent frozen tasks at https://github.com/openstack/tripleo-quickstart/blob/457e61fb73eb55153cd4b8105c6090b9730c13be/ansible.cfg#L21 | 19:00 |
ssbarnea|rover | mainly the ServerAliveInterval | 19:01 |
*** dpawlik has quit IRC | 19:01 | |
*** e0ne has joined #openstack-infra | 19:01 | |
ssbarnea|rover | if the config is not loaded, .... i am not sure what ssh_args will endup using, but i have the impression that this may be the cause. | 19:01 |
*** dpawlik has joined #openstack-infra | 19:01 | |
clarkb | fungi: https://review.openstack.org/#/c/625095/1 is the proxy change for pip caching from last week if you want to take a look | 19:04 |
fungi | ssbarnea|rover: yes, that's a good bet. particularly on centos we've witnessed headless ssh invocations hang when trying to close down their socket | 19:06 |
clarkb | fungi: I had that change and the ansible fix on the meeting agenda to bring them up in case we don't get them in today (as I'm going to be afk latter half of the week and din't want that getting lost | 19:07 |
ssbarnea|rover | fungi: clarkb now if you can help me fix this it would be great. My first attempt was to look at https://review.openstack.org/625576 | 19:08 |
ssbarnea|rover | which is supposed to be tested via https://review.openstack.org/#/c/625680/ (swill in queue) | 19:08 |
*** _alastor_ has quit IRC | 19:08 | |
ssbarnea|rover | the problem is that my test of a similar change on rdo (forked role) failed to fix the permission as ansible kept complaining about world writable folder. Is u=rwX,g=rX,o=rX wrong? | 19:09 |
*** Emine has joined #openstack-infra | 19:10 | |
ssbarnea|rover | or this only adds new permissions without removing w from o? | 19:10 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Update openstackid jobs https://review.openstack.org/625691 | 19:10 |
openstackgerrit | Merged openstack-infra/system-config master: Copy pasta the debian base server bits, don't include them https://review.openstack.org/625350 | 19:11 |
*** jamesmcarthur has quit IRC | 19:11 | |
clarkb | ssbarnea|rover: I would use a bitmask instead to avoid ambiguity around that | 19:12 |
clarkb | but ya you may need -w to remove the flag using that mode of chmod? | 19:12 |
ssbarnea|rover | clarkb: that runs recursively on both files and directories, so not sure how to assure directories are still exec. | 19:12 |
ssbarnea|rover | that line is a chmod, i hope you are not trying to tell me that we need two chmods there :D | 19:13 |
clarkb | ssbarnea|rover: a quick test with chmod itself has that working (no need to -w) | 19:14 |
ssbarnea|rover | based on https://serverfault.com/a/35107/10361 it should be correct. | 19:14 |
clarkb | ssbarnea|rover: possibly a bug in ansible if it doesn't work | 19:14 |
clarkb | one way to check that could be to exec chmod? | 19:14 |
ssbarnea|rover | clarkb: this folder is kept between builds, right? | 19:16 |
clarkb | ssbarnea|rover: it is on the test nodes which are single use for us so no | 19:16 |
*** smarcet has quit IRC | 19:16 | |
ssbarnea|rover | clarkb: this means that if I remove the entire chmod, the files should just have their default permissions, which are supposed to be correct. right? | 19:17 |
clarkb | I don't know what the umask is on the different platforms. It is possible it will be too restrictive on some of them | 19:17 |
ssbarnea|rover | i didn't really understand the reasoning behind that task in particular. why doing that operating in the first place | 19:18 |
clarkb | oh I see why it is writeable now though, its so that you cna hardlink | 19:18 |
clarkb | ssbarnea|rover: its so that you can have copies of the repos in other parts of the fs without being the file owner in the "source" | 19:18 |
ssbarnea|rover | clarkb: well, this is not safe, so no more hardlinking. | 19:18 |
clarkb | why isn't it safe? | 19:19 |
clarkb | an alternative approach would be to run your ansible.cfg out of /etc or similar | 19:19 |
clarkb | (and lock it down to make ansible happy, this is how the infra team's integration testing works iirc) | 19:19 |
ssbarnea|rover | because another user could mess with the file, i kinda agree with ansible that o+w is a very bad idea. | 19:19 |
ssbarnea|rover | regardless of what reasons we have behind. | 19:20 |
clarkb | they are single use test nodes though | 19:20 |
fungi | i think this just means we should only clone from those src trees, and not perform operations directly in them | 19:20 |
fungi | the hardlinks in the clone will get appropriate permissions instead | 19:20 |
ssbarnea|rover | do we really need more than one user accesing these files? | 19:21 |
fungi | devstack uses several accounts on the system | 19:21 |
fungi | i think the stack user does the cloning in devstack's case? | 19:21 |
clarkb | also | 19:21 |
clarkb | zuul-cloner is deprecated and has been for about a year now | 19:21 |
clarkb | we should be deleting code tat relies on those paths | 19:22 |
amorin | whey fungi frickler clarkb , I worked on the BHS1 aggregate | 19:23 |
clarkb | amorin: hello | 19:23 |
amorin | could you maybe try some io / ram stress test? | 19:23 |
amorin | frickler spawn many instances there | 19:23 |
clarkb | amorin: yes, are those instnaces distributed to the differeny hypervisors? | 19:23 |
amorin | almost all of them are on separated hosts | 19:23 |
amorin | yes | 19:23 |
amorin | instance name frickler-test17 is the only one not accessible | 19:24 |
ssbarnea|rover | clarkb: fungi : so it should be fine to remove the entire chmod. | 19:24 |
amorin | the host is under intervention | 19:24 |
clarkb | great. I can look at running some artificial dd based testing as well as devstack + tempest later today | 19:24 |
amorin | sounds great | 19:24 |
amorin | let me know the result, I must leave now but I will read it tomorrow morning | 19:25 |
clarkb | amorin: will do, thank you! | 19:25 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/zuul-jobs master: Remove unsecure chmod which makes src world writable https://review.openstack.org/625694 | 19:27 |
clarkb | ssbarnea|rover: I'm not sure we can remove it until we've moved more jobs off of it | 19:27 |
clarkb | (and depending on how long zuul wants to support those users as well) | 19:28 |
clarkb | I just worry about changing the behavior of a deprecated thing when the focus should be on using othe rtools instead | 19:28 |
ssbarnea|rover | clarkb: you know better what relies on it, so we can test that change. | 19:28 |
clarkb | ssbarnea|rover: any of the jobs using devstack-gate | 19:28 |
clarkb | (which are slowly going away) | 19:28 |
*** _alastor_ has joined #openstack-infra | 19:28 | |
ssbarnea|rover | clarkb: tripleo is too big/slow to make this, we still need to perform maintenance on deprecated roles. | 19:29 |
ssbarnea|rover | slowly is the word.... | 19:29 |
ssbarnea|rover | regarding hardlink permissions the workaround for devstack or whom else wants to symlink, is to put all users in a common group, so they should be able to hardlink | 19:31 |
ssbarnea|rover | the problem is with "o" not with "g". | 19:31 |
clarkb | ssbarnea|rover: alternatively, the fix for ansible is to use ansible config in /etc | 19:32 |
fungi | yes, but those jobs are effectively frozen. altering their behavior significantly now is not a great idea | 19:32 |
clarkb | fungi: ++ exactly | 19:32 |
ssbarnea|rover | clarkb: this makes no sense for me, tested code should be able to run with its internal setup. | 19:33 |
fungi | part of why we made zuul v3 was to reduce the degree to which the infra team had to care about corner cases with widely-shared job definitions | 19:33 |
ssbarnea|rover | we may even want/need to have multiple ansible.cfg files. | 19:33 |
clarkb | ssbarnea|rover: unless that code is using a legacy portion of zuul that we'd like you to stop using | 19:33 |
clarkb | if you are in that situation thenyou either work around the legacy behavior or get onto the modern tooling | 19:34 |
clarkb | but continuing to update and try to support the legacy behavior doesn't scale | 19:34 |
pabelanger | clarkb: ssbarnea|rover: fungi: https://review.openstack.org/596874/ was how I fixed the world readable directories, it has to do with how we are doing used-cache-repos. | 19:36 |
pabelanger | if somebody wanted to pick it up again | 19:36 |
clarkb | pabelanger: thats different | 19:36 |
clarkb | in this case its the zuul cloner interface we are trying to preserve for eg devstack-gate | 19:37 |
clarkb | which actually does realy on that functionality | 19:37 |
*** jamesmcarthur has joined #openstack-infra | 19:37 | |
*** anteaya has joined #openstack-infra | 19:37 | |
ssbarnea|rover | i think i will go offline now before i say something i cannot take back. | 19:38 |
clarkb | ssbarnea|rover: basically we realized that the way zuul cloner works is a mistake | 19:38 |
clarkb | we fixed it | 19:38 |
clarkb | we fixed it by not using zuul cloner and the bad interface anymore for new jobs and encourage people to convert | 19:38 |
pabelanger | clarkb: agree, even if fixed, I still think /home/zuul/src would be world readable for this reason. But agree, zuul-cloner is legacy and should be avoided. in fact, we should remove it from our base jobs, we are still included it by default | 19:38 |
clarkb | so we agree on that | 19:38 |
clarkb | what we don't agree on is changing the legacy role and possibly breaking many jobs | 19:39 |
clarkb | because then we'd have to update many legacy jobs instead of just replacing them | 19:40 |
ssbarnea|rover | clarkb: how can I see the impact of that change on old jobs? I am just curios to see how it breaks them. | 19:41 |
pabelanger | Remove fetch-zuul-cloner from base job: https://review.openstack.org/513506/ | 19:41 |
pabelanger | I adanboned it by mistake | 19:41 |
pabelanger | that is possible to break jobs, but jobs using it should be parented to legacy-base, not base | 19:41 |
clarkb | ssbarnea|rover: push a change to devstack-gate that depends on your change to update the chmod | 19:42 |
clarkb | ssbarnea|rover: that should still run devstack gate jobs there | 19:42 |
ssbarnea|rover | clarkb: thanks, I will do that now. better to see the extent of the damage. maybe we are lucky. | 19:42 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/devstack-gate master: DNM: test umask correction on src https://review.openstack.org/625697 | 19:45 |
fungi | doesn't guarantee there aren't other more susceptible users of that floating around in the system of course | 19:46 |
*** bobh has quit IRC | 19:47 | |
fungi | but it's a useful high-level check anyway | 19:47 |
clarkb | ya I think it is a useful bsaeline | 19:47 |
fungi | a lot of projects have copied bits of legacy functionality into the jobs in their respective repos too, ostensibly as a stepping-stone to rewriting them, but lot are still nowhere near complete with that i suspect | 19:48 |
clarkb | hrm speaking of devstack-gate. Looking at benchmarking these bhs1 nodes and d-g had/hsa the reproduce.sh script. We don't have that for current jobs so maybe I use d-g forthat? Have to figure out how I want to drive this | 19:49 |
*** bobh has joined #openstack-infra | 19:49 | |
fungi | which is going to make it fun when we eventually try to remove legacy pieces | 19:49 |
fungi | hrm, yeah no great ideas for orchestrating devstack+tempest other than old devstack-gate or hard-coding a bunch of configuration swiped from a recent devstack job and then alter the playbooks to install that configuration | 19:51 |
clarkb | ok I think what I'm going to do is boot instances for me to test on then once I've figured out a procedure apply that to frickler's distributed test nodes? Or maybe write the procedure down, apss that to frickler and make sure it doesn't conflict iwth any of frickler's plans | 19:53 |
*** Vadmacs has quit IRC | 19:53 | |
*** _alastor_ has quit IRC | 19:53 | |
*** bobh has quit IRC | 19:54 | |
clarkb | hehe we actually run a bunch of zuulv3 native jobs on d-g | 19:54 |
fungi | well, orchestrating some simple i/o testing with dd should be safe and quick to turn out | 19:54 |
*** _alastor_ has joined #openstack-infra | 19:55 | |
clarkb | ya I'll spin up my test server (on xenial since that is what we were looking at in the past), figure out some artificial benchmarking as well as devstack + tempest | 19:55 |
clarkb | then we can apply that to fricklers distributed nodes | 19:56 |
*** bobh has joined #openstack-infra | 19:57 | |
openstackgerrit | Merged openstack-infra/zuul master: Modify some file content errors https://review.openstack.org/624278 | 20:00 |
*** bobh has quit IRC | 20:02 | |
*** jamesmcarthur has quit IRC | 20:05 | |
*** sshnaidm is now known as sshnaidm|off | 20:07 | |
*** graphene has quit IRC | 20:07 | |
*** jamesmcarthur has joined #openstack-infra | 20:08 | |
clarkb | using different fio settings I find that if we do direct io we get about 10MB/s reads and writes, but disabling direct io its in the 300MB/s range for reads and writes | 20:12 |
clarkb | I think that does lend some weight to the argument that memory is part of the issue here | 20:12 |
clarkb | (I imagine you'd fall back on direct io behavior if you have no memory to cache things in, though dansmith iwll probably explain how that assumption is wrong) | 20:13 |
*** wolverineav has quit IRC | 20:22 | |
openstackgerrit | Merged openstack-infra/system-config master: Set CacheIgnoreCacheControl on pypi proxy cache https://review.openstack.org/625095 | 20:26 |
clarkb | first issues I've found is the bsh1 mirror is not accessible from bhs1 instances | 20:26 |
clarkb | it is accessible from the internet (my laptop at home) | 20:26 |
clarkb | oh wait | 20:27 |
clarkb | did I forget to set config drive on my instance? that may be the cause | 20:27 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/devstack-gate master: DNM: test umask correction on rdo Depends-On: https://review.rdoproject.org/r/17732 Change-Id: Id4e0ecb6cca01570a611c9824c6d821fb1e6d9b0 https://review.openstack.org/625697 | 20:27 |
clarkb | ya my instance has a /19 not a /32 | 20:27 |
clarkb | oh ya I hav eto set the metadata to fix that | 20:28 |
boden | hi. I was testing out py3.6 for lower-constraints (https://review.openstack.org/#/c/623229/) and ran into "Error: could not determine PostgreSQL version from '10.6'"... has anyone seen this by chance? | 20:28 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/devstack-gate master: DNM: test umask correction on src https://review.openstack.org/625697 | 20:28 |
*** sthussey has joined #openstack-infra | 20:32 | |
clarkb | boden: I think psycopg2 may have a -wheel or -binary package now to install that includes the pre linked binary stuff? I'm guessin that failure is lack of ability to find those headers? | 20:35 |
clarkb | boden: http://logs.openstack.org/84/625684/1/check/vmware-tox-lower-constraints/3b21419/job-output.txt.gz#_2018-12-17_18_35_45_388284 doesn't install headers just the client and server I think | 20:35 |
boden | clarkb is that something I need to resolve as part of the project? | 20:36 |
*** _alastor_ has quit IRC | 20:37 | |
clarkb | boden: yes, it should be part of your bindep.txt file | 20:37 |
boden | clarkb: ok I will dig; not familiar with that off hand | 20:38 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: Add a note on testing https://review.openstack.org/624578 | 20:39 |
clarkb | boden: basically its a list of binary dependencies for your project. We install them as part of the job setup | 20:39 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: Add a note on testing trusted roles https://review.openstack.org/624578 | 20:39 |
*** _alastor_ has joined #openstack-infra | 20:42 | |
*** wolverineav has joined #openstack-infra | 20:49 | |
*** yboaron has joined #openstack-infra | 20:49 | |
clarkb | ansible log says we are applying iptables role again | 20:50 |
clarkb | fungi: ^ fyi I think the lists server is going to get ansibled exim config now, want to dobule check the config is as you expect? | 20:51 |
fungi | yep | 20:53 |
fungi | currently last modified friday at 21:17z | 20:54 |
*** wolverineav has quit IRC | 20:54 | |
clarkb | my first ovh bhs1 test node ran devtack in 1216seconds this is without cached repos too | 20:56 |
clarkb | I have a second with the networking set up as we have it in nodepool | 20:56 |
clarkb | tempest is running on the first now | 20:56 |
clarkb | so far things are looking good to me, but I think we still want to run all of this on the distributed VMs once we are happy with my rough set of steps | 20:57 |
*** wolverineav has joined #openstack-infra | 21:00 | |
*** jamesmcarthur_ has joined #openstack-infra | 21:01 | |
*** wolverineav has quit IRC | 21:02 | |
*** jamesmcarthur has quit IRC | 21:04 | |
*** wolverineav has joined #openstack-infra | 21:04 | |
*** dmellado has quit IRC | 21:05 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/elastic-recheck master: Identify *POST* timeout failures individually https://review.openstack.org/625717 | 21:05 |
*** wolverineav has quit IRC | 21:05 | |
*** stevebaker has quit IRC | 21:05 | |
*** wolverineav has joined #openstack-infra | 21:06 | |
*** gouthamr has quit IRC | 21:06 | |
*** graphene has joined #openstack-infra | 21:07 | |
*** graphene has quit IRC | 21:09 | |
*** graphene has joined #openstack-infra | 21:10 | |
*** _alastor_ has quit IRC | 21:16 | |
*** graphene has quit IRC | 21:16 | |
*** gouthamr has joined #openstack-infra | 21:17 | |
*** _alastor_ has joined #openstack-infra | 21:18 | |
*** yolanda has quit IRC | 21:18 | |
*** jamesmcarthur has joined #openstack-infra | 21:20 | |
*** Emine has quit IRC | 21:23 | |
*** jamesmcarthur_ has quit IRC | 21:23 | |
clarkb | frickler: amorin I'm collecting the data I'm finding on my test nodes on the etherpad https://etherpad.openstack.org/p/bhs1-test-node-slowness | 21:27 |
*** rcernin has joined #openstack-infra | 21:28 | |
clarkb | at this rate I am half expecting that I won't be able to run the script on all of the frickler* nodes so would be good if frickler can do that with the script I'll be adding to the etherpad? | 21:28 |
clarkb | fwiw so far thing are looking good on the two instances I booted | 21:28 |
*** gouthamr_ has joined #openstack-infra | 21:30 | |
*** slaweq has quit IRC | 21:30 | |
*** bobh has joined #openstack-infra | 21:33 | |
*** gouthamr_ has quit IRC | 21:37 | |
openstackgerrit | Merged openstack-infra/git-review master: docs: Misc updates https://review.openstack.org/610574 | 21:38 |
openstackgerrit | Merged openstack-infra/git-review master: docs: Call out use of an agent to store SSH passwords https://review.openstack.org/610616 | 21:38 |
openstackgerrit | Merged openstack-infra/git-review master: tox.ini: add passenv = http_proxy https_proxy # _JAVA_OPTIONS https://review.openstack.org/624496 | 21:38 |
*** gouthamr_ has joined #openstack-infra | 21:41 | |
*** xek has quit IRC | 21:44 | |
*** gouthamr_ has quit IRC | 21:46 | |
*** slaweq has joined #openstack-infra | 21:46 | |
clarkb | http://paste.openstack.org/show/737512/ those look good as benchmarks | 21:47 |
openstackgerrit | Sean McGinnis proposed openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600 https://review.openstack.org/625290 | 21:48 |
clarkb | infra-root ^ any other benchmarking ideas that I can test on my throwaway instance? Everything I'm seeing shows it as happy. I think if frickler agrees we can reenable ovh tomorrow? | 21:49 |
*** dpawlik has quit IRC | 21:50 | |
*** slaweq has quit IRC | 21:51 | |
fungi | nothing comes to mind | 21:51 |
*** gouthamr_ has joined #openstack-infra | 21:51 | |
*** yboaron has quit IRC | 21:52 | |
*** wolverineav has quit IRC | 21:54 | |
*** wolverineav has joined #openstack-infra | 21:54 | |
*** gouthamr_ has quit IRC | 21:56 | |
*** trown is now known as trown|outtypewww | 21:58 | |
clarkb | I'll leave my two instances running in case anyone wants to jump on them and try stuff. I've recorded the devstack and tempest runtimes off of them in that paste above and they look good to me | 21:58 |
*** _alastor_ has quit IRC | 22:04 | |
*** _alastor_ has joined #openstack-infra | 22:05 | |
mriedem | clarkb: good news on http://status.openstack.org/elastic-recheck/#1806912 is that it looks like n-api is no longer a problem | 22:06 |
mriedem | after dan and my fixes are merged | 22:06 |
clarkb | mriedem: looks like there is a spike there, any links to the changes? | 22:06 |
clarkb | let me see if spike is on master | 22:07 |
mriedem | no idk what that's from | 22:07 |
mriedem | logstash says it's all on master | 22:07 |
clarkb | they are also in check | 22:07 |
clarkb | so could just be noise | 22:07 |
*** slaweq has joined #openstack-infra | 22:08 | |
mriedem | something is weird with the g-api check though, | 22:08 |
mriedem | because looking at one of the failures, g-api is ready in 3 seconds | 22:08 |
mriedem | http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/screen-g-api.txt.gz | 22:08 |
fungi | clarkb: excellent fracas update | 22:09 |
mriedem | well i guess g-api is ready, | 22:09 |
mriedem | but glance isn't serving GET /images requests | 22:09 |
clarkb | maybe it isn't ready yet then :P | 22:10 |
clarkb | that might be good feedback to glance though to not mark things ready until they can serve requests? | 22:10 |
*** ndahiwade has joined #openstack-infra | 22:11 | |
mriedem | also do you know why this dropped off? http://status.openstack.org/elastic-recheck/#1807518 | 22:11 |
clarkb | mriedem: I wonder if that maps to when we turned off ovh bhs1 | 22:12 |
fungi | looks like it continued well into friday | 22:12 |
clarkb | bhs1 was turned off on the 7th so that isn't it | 22:12 |
clarkb | mriedem: is that the apache proxy configuration fixes maybe? | 22:13 |
clarkb | hrm no that was the 11th and devstack/tempest move to bionic were the 12th | 22:14 |
clarkb | mriedem: oh actually I wonder if it was the oslo policy infinite recursion bug | 22:14 |
clarkb | mriedem: there was a thread about that on -discuss and cdent helped them sort it out | 22:14 |
mriedem | looking at http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/apache/access_log.txt.gz i see the 503s in there | 22:14 |
mriedem | 10.209.34.117 - - [17/Dec/2018:20:33:15 +0000] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0" | 22:15 |
clarkb | mriedem: https://review.openstack.org/#/c/625086/ I think that explains http://status.openstack.org/elastic-recheck/#1807518 | 22:16 |
mriedem | yeah same | 22:16 |
*** bobh has quit IRC | 22:20 | |
*** slaweq has quit IRC | 22:20 | |
clarkb | mriedem: the 503s imply to me that the backend service isn't actually up and serving requests yet | 22:20 |
*** rh-jelabarre has quit IRC | 22:21 | |
*** bobh has joined #openstack-infra | 22:22 | |
mriedem | this is where devstack is hitting g-api as well http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/apache/tls-proxy_error_log.txt.gz#_2018-12-17_20_33_08_840193 | 22:22 |
clarkb | mriedem: the 503s happen much later in the job than the g-api startup | 22:22 |
mriedem | yeah that's true | 22:22 |
clarkb | devstack stops running at ~18:42 | 22:23 |
clarkb | ut 503s are at ~20:33 | 22:23 |
openstackgerrit | Merged openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600 https://review.openstack.org/625290 | 22:23 |
clarkb | mriedem: I wonder if that is the same issue dansmith was looking at on friday | 22:24 |
mriedem | 2018-12-17 20:33:08.714 | + functions-common:_run_under_systemd:1484 : sudo systemctl start devstack@g-api.service | 22:24 |
mriedem | wtf | 22:24 |
clarkb | basically the whole system goes to lunch and rabbitmq times out and so on | 22:24 |
mriedem | yeha | 22:25 |
mriedem | =ERROR REPORT==== 17-Dec-2018::18:42:10 === closing AMQP connection <0.1344.0> (10.209.34.117:41178 -> 10.209.34.117:5672 - uwsgi:9448:b8040f63-7cc7-4025-9957-3626a7000014): missed heartbeats from client, timeout: 60s | 22:25 |
clarkb | and this job did run with direct-io enabled | 22:25 |
*** e0ne has quit IRC | 22:25 | |
clarkb | and this is a different cloud than the last one (this is rax-dfw the friday one was inap) | 22:25 |
clarkb | pointing at maybe a bionic issue | 22:26 |
clarkb | mriedem: does syslog show anything /me looks | 22:26 |
mriedem | oh heh btw infra core https://review.openstack.org/#/q/topic:dump-rabbitmqctl-report+(status:open+OR+status:merged) | 22:28 |
mriedem | i posted those in july of 2017 the last time we had weird rabbit issues in the gate | 22:28 |
clarkb | mriedem: will need to go into devstack now that d-g isn't used much | 22:29 |
clarkb | mriedem: this is totally thinking out loud a bit but it almost looks like we have two devstack runs on the same host | 22:29 |
clarkb | and possibly the second clobbers the first | 22:29 |
SpamapS | tobiash: I'm kind of over Alpine too. the -slim debian images are pretty good and you don't have to trust a bunch of people who aren't great at communicating where they're from, who they are, or why we're supposed to trust them. | 22:29 |
SpamapS | I'd really love for pbrx to switch to Debian or Ubuntu honestly. | 22:30 |
fungi | SpamapS: overflow from discussion in #zuul? | 22:30 |
clarkb | mriedem: looking in syslog you can see we install packages twice about two hours apart from each other | 22:30 |
clarkb | http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/syslog.txt.gz#_Dec_17_18_19_46 and http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/syslog.txt.gz#_Dec_17_20_19_58 | 22:30 |
clarkb | is it possible rax gate us a duplicate IP and somehow everything just worked for two jobs to ssh into the same host? | 22:31 |
mriedem | Dec 17 20:33:08 ubuntu-bionic-rax-dfw-0001240665 sudo[9402]: stack : TTY=unknown ; PWD=/opt/stack/devstack ; USER=root ; COMMAND=/bin/systemctl start devstack@g-api.service | 22:31 |
clarkb | corvus: ^ fyi this may be someting we want to guard against | 22:31 |
clarkb | mriedem: I'm going to look in nodepool logs for that IP now | 22:31 |
*** tpsilva has quit IRC | 22:32 | |
mriedem | yeah it starts it twice | 22:32 |
clarkb | nodepool doens't seem to think the IP was used twice | 22:32 |
clarkb | at least not reused after the 1800ish usage at thestart of that job | 22:32 |
corvus | clarkb: when we add the per-build ssh key, do we also remove the global key from authorized_keys? | 22:33 |
tobiash | SpamapS: yes, the -slim images aren't that huge anymore | 22:34 |
*** boden has quit IRC | 22:34 | |
corvus | clarkb: looks like no; so i guess we don't protect against that | 22:34 |
clarkb | corvus: ya we remove it from the ssh agent but not authorized_keys | 22:35 |
SpamapS | Oh whoops, I think I meant for that to go to #zuul | 22:35 |
*** jamesmcarthur has quit IRC | 22:35 | |
clarkb | that said I don't see evidence in nodepool we are interacting that way, something else must be starting devstack twice | 22:35 |
SpamapS | right as I typed those messages I dropped my headset. Must have hit a key to move me here. ;) | 22:35 |
corvus | SpamapS: all keys lead to -infra | 22:35 |
*** tobias-urdin has joined #openstack-infra | 22:38 | |
clarkb | mriedem: I think if we sort out why devstack is run twice we win a prize. The job output file seems to be devstack first run | 22:38 |
clarkb | mriedem: the devstack log file is the second devstack run | 22:39 |
clarkb | syslog doesn't seem to show a reboot | 22:40 |
ianw | clarkb: it's not confusion over output capture, between the console, the -early log file and the devstacklog.txt file? | 22:40 |
*** yamamoto has quit IRC | 22:40 | |
*** yamamoto has joined #openstack-infra | 22:41 | |
clarkb | ianw: I don't think so, if you look in syslog its clearly running dvstack package installs twice | 22:41 |
clarkb | ianw: and those package installs seem to line up timestamps wise with the two devstack runs we see (one in job output the other in devstack log) | 22:41 |
tobias-urdin | infra-root I've been working on getting access to the "openstack" account on PuppetForge https://forge.puppet.com/openstack and I've got in contact with the Puppet Inc employee that has access to this account and that managed this (manual) process of uploading Puppet module tarballs. I would like to get this account owned by the OpenStack project and then automate the upload of versions there | 22:41 |
tobias-urdin | later on. | 22:41 |
clarkb | and apache seems to corroborate that glance is running at first then after gets sad | 22:41 |
tobias-urdin | Is there any email I can use to transfer that account or should I try to seize control of the account personally first? | 22:41 |
clarkb | tobias-urdin: it may make sense to try and have the puppet openstack team own that and use zuul secrets to provide that data to your jobs | 22:42 |
*** fuentess has quit IRC | 22:42 | |
clarkb | ianw: the two start_time values differ too | 22:44 |
tobias-urdin | clarkb: yeah, that's my end goal here, I need an email address for that account and maybe infra has some email I could use otherwise I can transfer it to myself temporarly. | 22:44 |
clarkb | tobias-urdin: I don't think the infra team wants to keep being manager of those secrets that don't directly tie to the infra team. Does puppetforge not have the concept of a group owning something? | 22:45 |
ianw | tobias-urdin: you can use infra-root@openstack.org and we cloud keep the credentials in our store, and provide the secret? | 22:45 |
clarkb | ianw: ya we can do that, though I think we are trying to encourage teams to rely on us less for stuff like that since we don't have to be that roadbloack anymore | 22:45 |
smcginnis | clarkb: If you have a moment, I thought after https://review.openstack.org/#/c/620664/ merged, the UI would reflect that after some time. Are there additional steps needed to get Cinder set up like Designate is? | 22:46 |
ianw | true, but as team maintained i think it's probably ok if people want a circuit breaker on one person going missing? | 22:46 |
clarkb | ianw: ya, the way loci has addressed this is to have an org with >1 member on dockerhub then a robot account that is also a member of that org | 22:47 |
mriedem | clarkb: i hope the prize isn't a stuffed animal because i'm up to my knees in those already | 22:47 |
clarkb | so that anyone in the org can update the robot credentials | 22:47 |
clarkb | if puppetforge supports something similar I think that would be a good approach | 22:47 |
tobias-urdin | unfortunately i dont think it does, it's more a super simple approach on you have an account with an username and publish under that, no organization-like functionality. | 22:48 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add governance document https://review.openstack.org/622439 | 22:49 |
ianw | tobias-urdin: and then you're planning on a centralised release job to publish there? | 22:53 |
clarkb | mriedem: ianw zuul notices that job is a failure after the second devstack completes | 22:53 |
tobias-urdin | ianw: if it's ok with you i'll propose he changes email to infra-root@openstack.org then we can recover a password that way to skip having to send the password, probably unencrypted, somewhere. | 22:53 |
tobias-urdin | ianw: i was hoping being able to put together a job that can push modules there automatically after version bump in openstack/releases | 22:53 |
ianw | tobias-urdin: ok, well tell me when the email comes in, i can setup a random password, store it in our repo and provide the secret for such a job | 22:54 |
tobias-urdin | since the process has always been manual, and hasn't been done since Cody, the Puppet Inc employee, discontinued his involvement in Puppet Openstack, so we haven't had any updated modules there in years. | 22:54 |
tobias-urdin | ianw: thanks, i'll let you know! | 22:55 |
ianw | this is the type of thing that automated jobs do well :) | 22:55 |
tobias-urdin | indeed :) | 22:55 |
*** bobh has quit IRC | 22:56 | |
clarkb | mriedem: ianw ara's failed run devstack task captures the output of the "second" devstack run | 23:00 |
clarkb | based on matching start_time data | 23:00 |
*** Emine has joined #openstack-infra | 23:04 | |
ianw | clarkb: is this the 60bd495 logs you posted above? | 23:04 |
clarkb | ianw: yes | 23:04 |
*** jamesmcarthur has joined #openstack-infra | 23:05 | |
clarkb | if it were a time sync issue I would expect the nested start time values to line up at least since that should be recorded from the same frame of reference (on the test node) | 23:07 |
*** Emine has quit IRC | 23:08 | |
ianw | http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/job-output.txt.gz#_2018-12-17_18_14_43_992810 | 23:09 |
ianw | what's 10.209.34.117 | 23:09 |
clarkb | the rax private ip address for that instance | 23:10 |
clarkb | matches up with http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/zuul-info/inventory.yaml | 23:11 |
clarkb | its almost like ansible decides to run that task multiple times | 23:13 |
clarkb | but only records the second (but our console logging hacks record the first, also maybe the console logging hacks not working for the second point at a possible cause?) | 23:13 |
ianw | $ journalctl --file ./devstack.journal | 23:17 |
ianw | Failed to open files: Bad message | 23:17 |
ianw | i wonder if anyone has actually used that ... | 23:17 |
clarkb | ianw: its in the export format so you have to reimport it | 23:17 |
clarkb | systemd-journal-remote is the tool iirc | 23:18 |
*** jamesmcarthur has quit IRC | 23:18 | |
clarkb | I had it working when sdague added it | 23:19 |
clarkb | but its been a longtime | 23:19 |
*** _alastor_ has quit IRC | 23:20 | |
clarkb | the keystone service log implies that time is mostly continuous and we don't jump or reset | 23:21 |
ianw | Dec 18 07:29:41 ubuntu-bionic-rax-dfw-0001240665 devstack@keystone.service[16401] | 23:22 |
ianw | 2018-12-17 20:19:54.622 | + ./stack.sh:main:488 : exec | 23:23 |
ianw | that's 12 hours? | 23:24 |
ianw | or is journalctl doing some sort of TZ manipulation ... | 23:25 |
clarkb | "Dec 17 18:19:39 ubuntu-bionic-rax-dfw-0001240665 sudo[2837]: zuul : TTY=unknown ; PWD=/home/zuul ; USER=stack ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-smynvpgdbxuytkbqxjwinxoiuwrzslwf; /usr/bin/python2" then later "Dec 17 20:19:51 ubuntu-bionic-rax-dfw-0001240665 sudo[24797]: zuul : TTY=unknown ; PWD=/home/zuul ; USER=stack ; COMMAND=/bin/sh -c echo | 23:25 |
clarkb | BECOME-SUCCESS-smynvpgdbxuytkbqxjwinxoiuwrzslwf; /usr/bin/python2" | 23:25 |
clarkb | its running the same exact script twice according to syslog | 23:25 |
clarkb | ianw: I think it may be trying to normalize that for you? I dunno | 23:25 |
ianw | yeah i think it is, --utc helps | 23:25 |
clarkb | ianw: I thought the export format was supposed to fix those issues for us :( | 23:25 |
clarkb | based on the above syslo gI think this must be an ansible bug | 23:26 |
clarkb | ansible is running the same exact command for te task twice | 23:26 |
clarkb | dmsimard: mordred corvus ^ any ideas | 23:26 |
clarkb | from ara's perspective it isn't though, the time start and end coincide with the two running back to back ish | 23:28 |
clarkb | but ara only has logs for the second in the output portion of ara | 23:29 |
clarkb | I'm guessing this is second day in a row we get to find a big ansible bug :) | 23:29 |
*** gfidente has quit IRC | 23:30 | |
*** jamesmcarthur has joined #openstack-infra | 23:30 | |
clarkb | they are suspiciously almost exactly 2 hours apart from each other | 23:31 |
*** _alastor_ has joined #openstack-infra | 23:31 | |
*** ndahiwade has quit IRC | 23:31 | |
clarkb | keepalive behavior maybe? | 23:31 |
mordred | clarkb: WEIRD | 23:34 |
*** jamesmcarthur has quit IRC | 23:35 | |
*** dkehn has quit IRC | 23:37 | |
*** dkehn has joined #openstack-infra | 23:38 | |
*** mriedem has quit IRC | 23:40 | |
*** kgiusti has left #openstack-infra | 23:41 | |
clarkb | those randbits are defined in make_become_cmd | 23:42 |
dmsimard | I have no idea | 23:42 |
dmsimard | Lacking a bit of context though | 23:43 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: [wip] Add dogpile.cache master to the -src tests https://review.openstack.org/625457 | 23:43 |
dmsimard | What are we troubleshooting ? :D | 23:44 |
*** eernst has joined #openstack-infra | 23:44 | |
clarkb | dmsimard: it appears we have run devstack twice in a job causing the job to fial as the second run interferes with the first. From syslog ansible runs the same BECOME-SUCCESS command with the same randbits twice 2 hours apart | 23:45 |
*** eernst has quit IRC | 23:45 | |
clarkb | dmsimard: its almost as if ansible has a bug that causes it to run the task twice | 23:45 |
clarkb | _low_level_execute_command() calls make_become_cmd and doesn't loop | 23:46 |
clarkb | so for those randbits to remain constant I think we have to be in _low_level_execute_command | 23:46 |
clarkb | though I'm probably looking at devel not 2.5.14 | 23:46 |
*** eernst_ has joined #openstack-infra | 23:47 | |
*** eernst_ has quit IRC | 23:51 | |
ianw | mordred: http://logs.openstack.org/62/618962/1/check/nodepool-functional-py35-redhat-src/a7a7304/controller/logs/devstacklog.txt.gz ... all still failing with dogpile issues ... investigating | 23:52 |
clarkb | reading the ansible ssh connection _bare_run it does seem to Popen() twice but it should guard against that. Possible that we are seeing that as an issue? I dunno | 23:53 |
ianw | ahhh, we're not installing from constraints as nodepool goes into a virtualenv and we're using pip directly, rather than via devstack madness/magic | 23:53 |
clarkb | ya if there is a Popen that has an oserror or IOerror then ansible catches that and sets p = None then it checks if p = None and tries again | 23:54 |
* clarkb find sa link | 23:54 | |
*** _alastor_ has quit IRC | 23:54 | |
clarkb | is it possible https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/connection/ssh.py#L711-L731 starts the process which goes out to lunch for a while due to IO issues | 23:55 |
clarkb | then after 2 hours (with it running remotely too) it gets an IOError | 23:56 |
clarkb | then runs it again | 23:56 |
clarkb | mordred: ianw dmsimard ^ | 23:56 |
ianw | mordred: oh, i see, the < 0.7 pin wasn't actually merged to openstacksdk ... can we sort something out here? it's blocking glean, nodepool, dib ... | 23:57 |
mordred | ianw: there is a fix from kmalloc up for review ... but it failed things last time - which things do you think we should merge? | 23:58 |
mordred | clarkb: it certainly does look like it's possible for an ioerror on the first command to lead to a second invocation | 23:58 |
mordred | clarkb: due to not p | 23:58 |
*** Emine has joined #openstack-infra | 23:59 | |
mordred | clarkb: and something something ssh timeout something ioerror? | 23:59 |
ianw | mordred: personally my thought would be to merge the < 0.7 pin to give a bit more time to consider kmalloc's fix ... | 23:59 |
ianw | mordred: which also matches since requirements has merged the pin too | 23:59 |
clarkb | mordred: ya ssh timeout or tcp keepalives etc. The 2 hour gap there makes me really suspicious of that | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!