*** bobh has quit IRC | 00:10 | |
*** martinkennelly has quit IRC | 00:23 | |
*** hwoarang_ has joined #openstack-infra | 01:33 | |
*** hwoarang has quit IRC | 01:34 | |
*** wolverineav has joined #openstack-infra | 01:39 | |
*** bhavikdbavishi has joined #openstack-infra | 01:47 | |
*** wolverineav has quit IRC | 01:57 | |
*** bobh has joined #openstack-infra | 01:57 | |
openstackgerrit | zhouxinyong proposed openstack/diskimage-builder master: Delete the duplicate words in 50-zipl https://review.openstack.org/628815 | 02:02 |
---|---|---|
*** bobh has quit IRC | 02:05 | |
*** jamesmcarthur has joined #openstack-infra | 02:24 | |
*** bhavikdbavishi has quit IRC | 02:37 | |
*** hongbin has joined #openstack-infra | 02:47 | |
*** jamesmcarthur has quit IRC | 02:50 | |
*** whoami-rajat has joined #openstack-infra | 02:52 | |
*** psachin has joined #openstack-infra | 02:59 | |
*** hwoarang has joined #openstack-infra | 03:10 | |
*** hwoarang_ has quit IRC | 03:12 | |
*** wolverineav has joined #openstack-infra | 03:19 | |
openstackgerrit | zhurong proposed openstack-infra/project-config master: Retire murano-deployment https://review.openstack.org/628850 | 03:26 |
*** hongbin has quit IRC | 03:39 | |
openstackgerrit | Merged openstack-infra/project-config master: Add 'Review-Priority' for Zaqar repos https://review.openstack.org/628323 | 03:46 |
*** jamesmcarthur has joined #openstack-infra | 03:51 | |
*** bhavikdbavishi has joined #openstack-infra | 03:55 | |
*** ramishra has joined #openstack-infra | 04:01 | |
*** bhavikdbavishi has quit IRC | 04:02 | |
*** bhavikdbavishi has joined #openstack-infra | 04:02 | |
*** jamesmcarthur has quit IRC | 04:18 | |
*** wolverineav has quit IRC | 04:23 | |
*** udesale has joined #openstack-infra | 04:33 | |
*** bobh_ has joined #openstack-infra | 04:38 | |
openstackgerrit | zhurong proposed openstack-infra/project-config master: Retire murano-deployment https://review.openstack.org/628850 | 05:14 |
*** jamesmcarthur has joined #openstack-infra | 05:19 | |
*** jamesmcarthur has quit IRC | 05:23 | |
*** bobh_ has quit IRC | 05:29 | |
*** ykarel has joined #openstack-infra | 05:49 | |
*** diablo_rojo has joined #openstack-infra | 05:50 | |
*** bobh_ has joined #openstack-infra | 05:51 | |
*** bobh_ has quit IRC | 05:56 | |
*** hwoarang_ has joined #openstack-infra | 06:02 | |
*** hwoarang has quit IRC | 06:03 | |
*** bhavikdbavishi has quit IRC | 06:05 | |
*** wolverineav has joined #openstack-infra | 06:07 | |
*** armax has quit IRC | 06:11 | |
*** wolverineav has quit IRC | 06:12 | |
*** bobh_ has joined #openstack-infra | 06:41 | |
*** jtomasek has joined #openstack-infra | 06:53 | |
*** rcernin has quit IRC | 06:56 | |
*** apetrich has joined #openstack-infra | 07:06 | |
*** AJaeger has quit IRC | 07:11 | |
openstackgerrit | zhurong proposed openstack-infra/project-config master: Retire murano-deployment https://review.openstack.org/628850 | 07:11 |
*** AJaeger has joined #openstack-infra | 07:20 | |
*** bhavikdbavishi has joined #openstack-infra | 07:24 | |
*** dpawlik has joined #openstack-infra | 07:29 | |
*** ykarel is now known as ykarel|lunch | 07:29 | |
*** bobh_ has quit IRC | 07:34 | |
*** adriancz has joined #openstack-infra | 07:35 | |
*** agopi_ has joined #openstack-infra | 07:37 | |
*** rpittau has joined #openstack-infra | 07:38 | |
*** agopi_ is now known as agopi | 07:40 | |
*** tosky has joined #openstack-infra | 07:45 | |
*** yamamoto has joined #openstack-infra | 07:56 | |
*** yamamoto has quit IRC | 07:58 | |
*** ginopc has joined #openstack-infra | 07:59 | |
*** diablo_rojo has quit IRC | 08:02 | |
*** aojea has joined #openstack-infra | 08:08 | |
*** bobh_ has joined #openstack-infra | 08:09 | |
*** pcaruana has joined #openstack-infra | 08:14 | |
*** agopi has quit IRC | 08:18 | |
*** kjackal has joined #openstack-infra | 08:22 | |
*** ykarel|lunch is now known as ykarel | 08:23 | |
*** pcaruana has quit IRC | 08:24 | |
*** rascasoft has joined #openstack-infra | 08:25 | |
*** xek has joined #openstack-infra | 08:39 | |
*** yamamoto has joined #openstack-infra | 08:42 | |
*** rcarrillocruz has joined #openstack-infra | 08:45 | |
*** pcaruana has joined #openstack-infra | 08:46 | |
*** bobh_ has quit IRC | 08:55 | |
*** jpich has joined #openstack-infra | 08:57 | |
*** yamamoto has quit IRC | 09:05 | |
*** ykarel has quit IRC | 09:06 | |
*** ykarel has joined #openstack-infra | 09:06 | |
*** bobh_ has joined #openstack-infra | 09:16 | |
*** gfidente has joined #openstack-infra | 09:27 | |
*** shardy has joined #openstack-infra | 09:31 | |
*** owalsh_ is now known as owalsh | 09:35 | |
*** derekh has joined #openstack-infra | 09:36 | |
*** ssbarnea|bkp2 has joined #openstack-infra | 09:37 | |
*** ssbarnea has quit IRC | 09:39 | |
*** wolverineav has joined #openstack-infra | 09:49 | |
*** wolverineav has quit IRC | 09:53 | |
*** yamamoto has joined #openstack-infra | 09:53 | |
*** jaosorior has joined #openstack-infra | 10:03 | |
*** dtantsur|afk is now known as dtantsur | 10:08 | |
*** agopi has joined #openstack-infra | 10:17 | |
*** roman_g has joined #openstack-infra | 10:18 | |
*** bobh_ has quit IRC | 10:19 | |
*** agopi is now known as agopi_ | 10:20 | |
*** agopi_ is now known as agopi | 10:21 | |
*** bhavikdbavishi has quit IRC | 10:21 | |
*** gfidente has quit IRC | 10:27 | |
openstackgerrit | Merged openstack-infra/zuul master: dict_object.keys() is not required for *in* operator https://review.openstack.org/621482 | 10:27 |
*** sshnaidm|off is now known as sshnaidm | 10:30 | |
*** yamamoto has quit IRC | 10:35 | |
*** arxcruz|brb is now known as arxcruz | 10:35 | |
openstackgerrit | Merged openstack/ptgbot master: Pin irc module to 15.1.1 to avoid import error https://review.openstack.org/626906 | 10:36 |
openstackgerrit | Merged openstack/ptgbot master: Generate PTGbot index page dynamically https://review.openstack.org/626907 | 10:37 |
*** mpeterson has quit IRC | 10:40 | |
*** mpeterson has joined #openstack-infra | 10:42 | |
*** mpeterson has quit IRC | 10:42 | |
*** yamamoto has joined #openstack-infra | 10:45 | |
*** mpeterson has joined #openstack-infra | 10:54 | |
*** udesale has quit IRC | 10:54 | |
*** gfidente has joined #openstack-infra | 11:02 | |
*** mpeterson has quit IRC | 11:05 | |
*** mpeterson has joined #openstack-infra | 11:06 | |
*** pbourke has quit IRC | 11:07 | |
*** pbourke has joined #openstack-infra | 11:09 | |
*** pcaruana has quit IRC | 11:11 | |
*** pcaruana has joined #openstack-infra | 11:16 | |
ssbarnea|bkp2 | infra-root: need review on https://review.openstack.org/#/c/625576/ -- to undo the damaging unsafe umask on src folder | 11:19 |
jhesketh | ssbarnea|bkp2: lgtm | 11:24 |
*** kjackal has quit IRC | 11:34 | |
*** tobias-urdin is now known as tobias-urdin_afk | 11:34 | |
*** kjackal has joined #openstack-infra | 11:34 | |
*** rpittau is now known as rpittau|lunch | 11:41 | |
ssbarnea|bkp2 | jhesketh: thanks. this one is more important than it appears as it has some unexpected side effects, also preventing us to fix the random post timeouts. | 11:45 |
ssbarnea|bkp2 | ... not to mention that the original approach did not make much sense anyway :D | 11:45 |
jhesketh | right, I'm not comfortable pushing it through without a second review though as it's a trusted repo. Additionally given its potential affects it should probably be baby sat which sadly it's late here | 11:46 |
ssbarnea|bkp2 | jhesketh: totally agree. i am sure we will get others from US. | 11:48 |
*** yamamoto has quit IRC | 11:50 | |
*** yamamoto has joined #openstack-infra | 11:53 | |
*** bhavikdbavishi has joined #openstack-infra | 11:54 | |
*** agopi has quit IRC | 11:56 | |
*** dpawlik has quit IRC | 11:58 | |
*** tobias-urdin_afk is now known as tobias-urdin | 11:59 | |
* SpamapS tries treating his insomnia with updating the shade package in Debian. It's not working... but at least shade will be up to date. :-P | 12:12 | |
openstackgerrit | Slawek Kaplonski proposed openstack-infra/openstack-zuul-jobs master: Remove tempest-dsvm-neutron-scenario-linuxbridge job definition https://review.openstack.org/628942 | 12:13 |
*** bobh_ has joined #openstack-infra | 12:16 | |
*** dkehn has quit IRC | 12:21 | |
*** bobh_ has quit IRC | 12:21 | |
*** agopi has joined #openstack-infra | 12:24 | |
*** dpawlik has joined #openstack-infra | 12:25 | |
frickler | corvus: did you decide when to do the k8s walkthrough? seems like tomorrow is preferred, but it would be good if I could know for sure soon, so I can plan my evenings a bit | 12:26 |
*** yamamoto has quit IRC | 12:26 | |
*** yamamoto has joined #openstack-infra | 12:29 | |
*** ykarel is now known as ykarel|afk | 12:30 | |
*** zigo has quit IRC | 12:31 | |
SpamapS | frickler: IIRC k8s walkthrough is on Tuesday (US/Pacific), not sure the exact time. | 12:32 |
*** bhavikdbavishi has quit IRC | 12:32 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Extract common config parsing for ProviderConfig https://review.openstack.org/625094 | 12:34 |
*** zhangfei has joined #openstack-infra | 12:37 | |
*** rlandy has joined #openstack-infra | 12:42 | |
frickler | SpamapS: after the infra meeting is the first option on the ethercalc, wednesday the second one, so that would match my assumption | 12:51 |
SpamapS | ya | 12:52 |
*** yamamoto has quit IRC | 12:52 | |
frickler | infra-root: system-config-run-base-ansible-devel seems to be failing since friday, "ERROR! Unexpected Exception, this is probably a bug: 'PlaybookCLI' object has no attribute 'options'" http://logs.openstack.org/16/628216/4/check/system-config-run-base-ansible-devel/2fbf1ef/job-output.txt.gz#_2019-01-04_17_21_05_827683 | 12:52 |
*** evrardjp has joined #openstack-infra | 12:54 | |
*** rpittau|lunch is now known as rpittau | 12:56 | |
*** evrardjp has quit IRC | 12:59 | |
*** evrardjp has joined #openstack-infra | 12:59 | |
*** jhesketh has quit IRC | 13:05 | |
*** jhesketh has joined #openstack-infra | 13:06 | |
frickler | looks like this might be the culprit https://github.com/ansible/ansible/commit/afdbb0d9d5bebb91f632f0d4a1364de5393ba17a | 13:08 |
*** kaiokmo has joined #openstack-infra | 13:09 | |
frickler | possibly a genuine upstream bug instead of some bad use of internals on our side | 13:09 |
mordred | frickler: yah - I think so - we don't use the python api for running playbooks | 13:10 |
mordred | frickler: oh - wait - I think we might be using CLI options in callback plugins | 13:11 |
mordred | frickler: ok. I;m going to stop responding until I've had more coffee | 13:12 |
mordred | frickler: that's us running ansible in a job to run base.yaml - so isn't a zuul-side error, so yeah, I'd say that's most likely to be an ansible bug | 13:13 |
*** ykarel|afk is now known as ykarel | 13:15 | |
*** boden has joined #openstack-infra | 13:18 | |
*** wolverineav has joined #openstack-infra | 13:25 | |
openstackgerrit | Merged openstack-infra/project-config master: Add fetch-output to base job https://review.openstack.org/511851 | 13:26 |
*** trown|outtypewww is now known as trown | 13:27 | |
*** tmorin has joined #openstack-infra | 13:29 | |
*** wolverineav has quit IRC | 13:29 | |
*** boden has quit IRC | 13:30 | |
tmorin | hi infraroot: would someone be available to freeze and open access to a CI devstack VM, to allow me to investigate a failure I can't manage to reproduce locally ? | 13:30 |
*** dave-mccowan has joined #openstack-infra | 13:31 | |
tmorin | (the job is legacy-tempest-dsvm-networking-bgpvpn-bagpipe for change 626895,3) | 13:31 |
*** tmorin has left #openstack-infra | 13:31 | |
*** tmorin has joined #openstack-infra | 13:31 | |
*** ykarel is now known as ykarel|away | 13:32 | |
tmorin | infra-root ^ (perhaps more likely to ping someone than 'infraroot') | 13:32 |
tmorin | thanks in advance | 13:32 |
SpamapS | mordred: ty for the shade +A .. I'm getting 1.30.0 into Debian and finally fixing the RC that kept it out of all releases (usr/bin/shade-inventory in python- and python3-) | 13:32 |
SpamapS | having to remember some incantations though | 13:32 |
SpamapS | mordred: quite nice to drop all of those old build-deps for openstacksdk. | 13:33 |
*** udesale has joined #openstack-infra | 13:36 | |
frickler | tmorin: looking | 13:38 |
mordred | SpamapS: ++ - I enjoyed your tweet about the packaging this morning | 13:38 |
tmorin | frickler: thanks! (I 'recheck' minutes ago, seeing a failure with many jobs) | 13:39 |
SpamapS | mordred: yeah, I just wish I was asleep instead of tweeting to your delight. ;) | 13:40 |
tmorin | "ERROR Unable to find playbook /var/lib/zuul/builds/cf13bfca241e43f890868f4c09ce963c/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base/post-ssh.yaml" -> this seems unusual, although totally unrelated to the issue I'm trying to investigate | 13:41 |
*** jamesmcarthur has joined #openstack-infra | 13:41 | |
tmorin | frickler: the job isn't yet running, it's currently in "queued" status | 13:42 |
mnaser | i just ran into the same error that tmorin ran into for my job | 13:42 |
frickler | tmorin: hmm, it looks like we may have broken project-config | 13:42 |
mnaser | i mean | 13:43 |
mnaser | the last merge into project-config is | 13:43 |
frickler | mordred: this relates to https://review.openstack.org/#/c/511851/ ^^ | 13:43 |
mnaser | yeah | 13:43 |
frickler | networking-bgpvpn-dsvm-functional networking-bgpvpn-dsvm-functional : ERROR Unable to find playbook /var/lib/zuul/builds/551b054a32fb425caa81d8ef5ba4ca2d/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base/post-ssh.yaml | 13:43 |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/project-config master: Revert "Add fetch-output to base job" https://review.openstack.org/628967 | 13:44 |
frickler | infra-root: ^^ probably need this until we have a better fix | 13:44 |
mnaser | we probably missed a post-ssh.yaml somewhere | 13:45 |
mnaser | yep | 13:46 |
mnaser | we missed the one | 13:46 |
mnaser | inside 'base' | 13:46 |
frickler | this rather looks like a bad rebase to me | 13:46 |
*** smarcet has joined #openstack-infra | 13:46 | |
*** jcoufal has joined #openstack-infra | 13:46 | |
frickler | meh, the revert fails zuul, too | 13:47 |
*** ykarel|away has quit IRC | 13:47 | |
mnaser | we might need an infra-root to force merge | 13:47 |
frickler | mnaser: do you think you have an easy fix? otherwise I'd force-merge the revet | 13:47 |
fungi | mordred: pabelanger: you've been working on the fetch-output stuff, right? ^ | 13:47 |
mnaser | i mean | 13:47 |
mnaser | i see a post-ssh.yaml that is referenced in the base job | 13:47 |
mordred | uhoh. yeah - let's revert asap | 13:48 |
frickler | fungi: yes, https://review.openstack.org/511851 merged 20 minutes ago, seems it broke everything | 13:48 |
openstackgerrit | Mohammed Naser proposed openstack-infra/project-config master: fetch-output: switch base to use post.yaml https://review.openstack.org/628970 | 13:48 |
mnaser | mordred: frickler ^ | 13:48 |
mnaser | i think that's the issue | 13:49 |
mordred | mnaser: wow, how did we miss that | 13:49 |
mordred | we could just slam that one in instead - I think we're going to need a force-merge in either case | 13:49 |
mnaser | ill leave the decision of revert OR force-merge that up to whoever | 13:49 |
frickler | it will probably fail in zuul, too, yes | 13:49 |
frickler | mordred: I'll leave it to you to clean up your patch, if you don't mind | 13:50 |
mordred | kk. I'm going to force-merge mnaser's patch | 13:50 |
openstackgerrit | Merged openstack-infra/project-config master: fetch-output: switch base to use post.yaml https://review.openstack.org/628970 | 13:50 |
mnaser | ok lets see if we unbroke things now | 13:50 |
*** kgiusti has joined #openstack-infra | 13:51 | |
mordred | oh. sigh. I only set it on base-minimal before. sorry about that! | 13:51 |
mnaser | we know who's buying first round of drinks next ptg! | 13:51 |
mordred | yes! | 13:52 |
fungi | ibm? ;) | 13:52 |
evrardjp | haha | 13:52 |
mordred | thanks ginny | 13:52 |
fungi | sorry, too soon | 13:53 |
mnaser | i see jobs starting up | 13:53 |
mnaser | i think we're good | 13:53 |
*** boden has joined #openstack-infra | 13:53 | |
fungi | thanks for spotting that! | 13:54 |
mordred | frickler, fungi, mnaser: since you all have fetch-output stuff paged in- there's a patch that adds functional tests too: https://review.openstack.org/#/c/628731/ | 13:56 |
frickler | tmorin: o.k., now your job should be running properly and will be held once/if it fails. which ssh key shall I use for access? | 13:57 |
mnaser | mordred: btw, i would suggest getting in the habit of using 'is' instead of '|' | 13:57 |
mnaser | i think | is being dropped soon, it throws deprecation stuff all over runs in newer versions of ansible | 13:58 |
tmorin | frickler: to be able to troubleshoot the problematic OVS state, I need to prevent tempest cleanup steps from happening | 13:58 |
mordred | mnaser: ah - good call | 13:58 |
tmorin | frickler: I already tweaked the tempest test to do a sleep(10000) at the right place | 13:59 |
mnaser | mordred: it also reads easier sometimes, 'log_directory is not changed' | 13:59 |
mnaser | but anyways, that's just a nit :) | 13:59 |
tmorin | frickler: so we don't need to wait for a failure to freeze it | 13:59 |
mordred | mnaser: totally. I'm going to do a folllowup that changes those - and also the ones that I cargo-culted from :) | 13:59 |
tmorin | frickler: sending you my pub key by PM | 13:59 |
mordred | mnaser: does | succeeded go to is succeeded too? | 13:59 |
mnaser | yep mordred | 14:00 |
frickler | tmorin: oh, o.k., then I need to dig into finding the correct node before it is being listed as held | 14:00 |
mnaser | https://docs.ansible.com/ansible/latest/user_guide/playbooks_tests.html#test-syntax -- "As of Ansible 2.5, using a jinja test as a filter will generate a warning." | 14:01 |
*** whoami-rajat has quit IRC | 14:01 | |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use is instead of | for tests https://review.openstack.org/628973 | 14:02 |
mordred | mnaser: ^^ | 14:02 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Update laravel legacy jobs for PHP 7.x https://review.openstack.org/628974 | 14:04 |
*** tmorin has quit IRC | 14:04 | |
mnaser | mordred: small tweak :) | 14:06 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-base-jobs master: Add fetch-output to base jobs https://review.openstack.org/628975 | 14:07 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-base-jobs master: Ignore errors on ssh key removal https://review.openstack.org/628976 | 14:07 |
mordred | mnaser: ah - yes - thanks! | 14:07 |
mordred | SpamapS: ^^ you use the zuul-base-jobs repo I believe? you might be interested in the stack there | 14:07 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use is instead of | for tests https://review.openstack.org/628973 | 14:10 |
mordred | mnaser: fixed. thanks! that's much better | 14:10 |
*** e0ne has joined #openstack-infra | 14:12 | |
*** ykarel|away has joined #openstack-infra | 14:12 | |
pabelanger | fungi: it doesn't look like 511851 was staged properly via bast-test first, which broke things on merge | 14:12 |
smarcet | fungi: mordred: morning , when u have some time please review https://review.openstack.org/#/c/628974/ thx ! | 14:13 |
*** irclogbot_1 has quit IRC | 14:14 | |
mordred | pabelanger: nah - we did base-test - it's just when we applied it to base, I only updated the base-minimal job description and not also the base job description | 14:15 |
mordred | silly me | 14:15 |
mordred | it makes me really wish zuul would consider a job definition that references a non-existent playbook as a config error (although it would be a bit expensive for it to do so) | 14:16 |
*** jamesmcarthur has quit IRC | 14:18 | |
dhellmann | config-core: this change to add a release job to the placement repo is a prereq for including it in the stein release, and the deadline for that is this week. Please add it to your review queue for the next couple of days. https://review.openstack.org/#/c/628240/ | 14:19 |
*** e0ne has quit IRC | 14:19 | |
pabelanger | mordred: ah, I see 628780 now | 14:24 |
AJaeger | mordred, mnaser, could you review https://review.openstack.org/#/c/628240/ , please? | 14:25 |
*** xek has quit IRC | 14:27 | |
*** xek has joined #openstack-infra | 14:28 | |
mordred | AJaeger: done | 14:30 |
mordred | pabelanger: yeah - oh well, we can't be perfect :) | 14:30 |
AJaeger | mordred: something still broken, see https://review.openstack.org/#/c/628731/ | 14:30 |
AJaeger | ERROR Unable to find playbook /var/lib/zuul/builds/45e70d41476244b1b1ebdcea184fd3d8/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base-minimal/post.yaml | 14:31 |
mordred | oh ugh | 14:31 |
mordred | yeah - one sec | 14:31 |
mordred | that'll only be affecting those tests and not everybody | 14:31 |
mordred | but still ugh | 14:31 |
dhellmann | config-core: this patch to add new repos to the sahara project is also a prereq for a governance change for which this week's milestone is the deadline. Please add it to your review queue for early this week. https://review.openstack.org/#/c/628209/ | 14:32 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Rename base-minimal/post-ssh to base-minimal/post https://review.openstack.org/628983 | 14:33 |
mordred | pabelanger, AJaeger: ^^ | 14:33 |
mordred | pabelanger: maybe I should have started with getting your base job refactor in first :) | 14:33 |
mordred | AJaeger: wildcards work in gerritbot config? (re: 628209) | 14:34 |
mordred | AJaeger: ah - so they do. neat! | 14:35 |
*** smarcet has quit IRC | 14:36 | |
*** needsleep is now known as TheJulia | 14:36 | |
AJaeger | mordred: yes | 14:36 |
*** smarcet has joined #openstack-infra | 14:37 | |
ssbarnea|bkp2 | mordred: pabelanger : any chance to get https://review.openstack.org/#/c/625576/ merged? | 14:37 |
AJaeger | ssbarnea|bkp2: let's first fix the current breakage, please ;) | 14:38 |
AJaeger | ssbarnea|bkp2: your change has a high risk for failure and is untested... | 14:38 |
pabelanger | ssbarnea|bkp2: I haven't followed but thought talks with clarkb and corvus was that is expected behavor for legacy reasons | 14:39 |
pabelanger | I'd much rather see jobs stop using zuul-cloner | 14:39 |
pabelanger | and delete that role | 14:39 |
*** irclogbot_1 has joined #openstack-infra | 14:39 | |
mordred | yeah - I'm worried about the fallout from making that change - it's super hard to test or figure out what might break | 14:39 |
*** fuentess has joined #openstack-infra | 14:39 | |
dhellmann | mordred : thank you! | 14:40 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Update laravel legacy jobs for PHP 7.x https://review.openstack.org/628974 | 14:40 |
ssbarnea|bkp2 | mordred: that chmod was evil in the first place, i do understand that we need to be careful about that change, but this does not mean we shouldn't repair the damage just because of potential risk. right? | 14:41 |
*** jamesmcarthur has joined #openstack-infra | 14:42 | |
ssbarnea|bkp2 | i think we should be able to findout in less than 30min if something important is affected and address it (doing it a job level, a local fix, or even a revert) | 14:43 |
openstackgerrit | Merged openstack-infra/project-config master: Add publish-to-pypi template to placement https://review.openstack.org/628240 | 14:44 |
fungi | ssbarnea|bkp2: is your supposition that the issue https://review.openstack.org/512285 attempted to fix by adding that is no longer present? | 14:45 |
*** ykarel|away is now known as ykarel | 14:45 | |
*** smarcet has quit IRC | 14:45 | |
*** e0ne has joined #openstack-infra | 14:46 | |
ssbarnea|bkp2 | fungi: yep, this was my impression that we no longer need the hardlinking support. | 14:46 |
*** e0ne has quit IRC | 14:47 | |
fungi | i think the discussion of the original problem starts at http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-15.log.html#t2017-10-15T12:24:45 | 14:48 |
openstackgerrit | Merged openstack-infra/project-config master: Add new Sahara repositories for split plugins https://review.openstack.org/628209 | 14:49 |
openstackgerrit | Merged openstack-infra/project-config master: Rename base-minimal/post-ssh to base-minimal/post https://review.openstack.org/628983 | 14:50 |
tobias-urdin | fungi: a while ago ianw_pto helped fix the forge.puppet.com credentials for the "openstack" account and iiuc it should be stored as a secret in zuul now but I can't seem to find it after browsing all repos and commit history, do you know where I could find it? | 14:50 |
*** anteaya has joined #openstack-infra | 14:50 | |
AJaeger | tobias-urdin: might be only in the "private" hiera secret store | 14:52 |
anteaya | so some confused third party ci person doesn't yet understand the purpose of an example wiki page: https://wiki.openstack.org/w/index.php?title=ThirdPartySystems/Example&diff=next&oldid=56443 | 14:52 |
anteaya | I'll change the text back and email the account cc'ing the infra email list | 14:53 |
fungi | thanks anteaya | 14:53 |
anteaya | I don't know how successful I'll be, so thought I'd let folks know | 14:53 |
fungi | tobias-urdin: is there a job uploading files to puppetforge? i can likely trace backwards from whatever's using it | 14:54 |
fungi | anteaya: if you're a wiki admin you should be able to roll back the edit | 14:54 |
tobias-urdin | fungi: no, I'm in the process of building that but the missing piece is what the secret was named so I can access it | 14:54 |
*** smarcet has joined #openstack-infra | 14:55 | |
fungi | anteaya: if you see an "undo" link next to it in the list at https://wiki.openstack.org/w/index.php?title=ThirdPartySystems&action=history then that should hopefully do what you need | 14:56 |
*** rlandy is now known as rlandy|rover | 14:56 | |
*** smarcet has quit IRC | 14:56 | |
*** zhangfei has quit IRC | 14:57 | |
fungi | tobias-urdin: i'll look in the usual places and see if we've recorded it somewhere | 14:57 |
tobias-urdin | fungi: thanks! | 14:57 |
fungi | tobias-urdin: i have a password for an openstackinfra user on puppetforge with a comment: This is for the "openstack" namespace. This used to be owned by a single user, but at request of PTL was assigned to infra. user names map 1:1 with emails so we could not reuse above. Note this + email gets filtered into a folder on infra-root imap server | 14:59 |
fungi | oh, sorry, that comment was for the openstack user, the openstackinfra user is noted as unused | 14:59 |
tobias-urdin | that's probably it, we had some issues since there was an openstack-infra namespace already that used the infra-root email | 15:01 |
*** irclogbot_1 has quit IRC | 15:01 | |
anteaya | fungi: seems I am a wiki admin, I found a rollback option | 15:01 |
tobias-urdin | the namespace for modules is the username of the account so openstack/glance (which we upload from git.o.org/openstack/puppet-glance) puppet module is the "openstack" account on forge.puppet.com | 15:02 |
fungi | anteaya: great! i thought i recalled you being one, but didn't have time to check your account perms | 15:02 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Clean up command sockets on stop https://review.openstack.org/628990 | 15:03 |
anteaya | fungi: you have a great memory | 15:04 |
fungi | tobias-urdin: makes sense. so anyway, if the intent is that the credentials for that account are going to be managed centrally (not by the puppet-openstack team), then we likely need the playbook which will use it added to the openstack-infra/project-config repo. if you want to propose that with a placeholder for the zuul secret, i can upload a revision of the change which includes the encrypted | 15:04 |
fungi | password | 15:04 |
mordred | fungi: yeah - I think the idea was to have a central "upload to puppetforge" job sort of like upload-pypi | 15:08 |
mordred | iirc | 15:08 |
* mordred wasn't 100% paying attention | 15:08 | |
tobias-urdin | https://review.openstack.org/#/q/topic:forge-publish+(status:open+OR+status:merged) | 15:09 |
*** irclogbot_1 has joined #openstack-infra | 15:09 | |
tobias-urdin | ^ is what i have so far, where https://review.openstack.org/#/c/627573/ is the one that will use the secret | 15:10 |
anteaya | also looking at that contributor's edits on the wiki, that username is a 9 digit number and it appears their co-worker is a 10 digit number username | 15:10 |
anteaya | feels weird to me | 15:10 |
anteaya | but we don't have a rule about wiki usernames | 15:10 |
* anteaya is looking at https://wiki.openstack.org/w/index.php?title=ThirdPartySystems&diff=cur&oldid=167461 | 15:11 | |
fungi | anteaya: i've seen so many weird things from our community i've started to question them less and less often ;) | 15:11 |
anteaya | okay, thanks for the sanity check | 15:11 |
fungi | tobias-urdin: great! so i guess we need to add a secret in zuul.d/secrets.yaml and then add it to the secrets list for the release-openstack-puppet job description | 15:12 |
clarkb | ssbarnea|bkp2: pabelanger mordred ya my concern is that that role is for preserving "legacy" behavior | 15:13 |
clarkb | so updating it to not be legacy is potentially dangerous | 15:14 |
ssbarnea|bkp2 | fungi: re chmod on src, if i understand well the risk is only around legacy jobs, right? probably something like http://codesearch.openstack.org/?q=cp%20-l&i=nope&files=&repos= being the affected stuff, right? | 15:14 |
clarkb | instead you should stop using zuul cloner | 15:14 |
fungi | what shall we call it? "puppetforge_credentials" looks like it would be most consistent with the other entries, but there is a lot of variation in there | 15:14 |
fungi | tobias-urdin: ^ | 15:14 |
fungi | ssbarnea|bkp2: lots of things perform a hardlinking copy under the hood... virtualenv (via tox or otherwise), git clone file:///... and more | 15:16 |
ssbarnea|bkp2 | fungi: hardlinking limitation due to file permissions applies only original owner is different than the one trying to do the hardlinking. | 15:17 |
ssbarnea|bkp2 | fungi: if everything is run under the same user, there is no need to "hack" default umask | 15:17 |
fungi | and testing for it becomes challenging because if it's between different filesystems then those tools have fallbacks which will do something other than hardlink (because that's not possible) but we have different filesystem layouts in different providers | 15:17 |
ssbarnea|bkp2 | and AFAIK, anything under ~zuul should be owned by zuul and no other users. | 15:17 |
tobias-urdin | fungi: sounds good to me, but i'll leave it entirely up to infra :) | 15:18 |
*** dpawlik has quit IRC | 15:18 | |
fungi | ssbarnea|bkp2: sure, top offenders will be jobs which use multiple accounts, such as devstack and devstack-based functional test jobs | 15:18 |
ssbarnea|bkp2 | fungi: even jobs with multiple accounts have workarounds that do not need the umask hack: just adding the stack user to the zuul group is fixing the hardlinking issue | 15:18 |
ssbarnea|bkp2 | we only need to avoid o+w | 15:19 |
fungi | ssbarnea|bkp2: that happened? on old branches of devstack too? | 15:19 |
ssbarnea|bkp2 | fungi: i don't know that, i am only trying to eradicate the umask while trying to address risks it receive. | 15:21 |
*** openstackgerrit has quit IRC | 15:22 | |
*** openstackgerrit has joined #openstack-infra | 15:23 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Ensure command_socket is last thing to close https://review.openstack.org/628995 | 15:23 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/devstack-gate master: Replace cp -l with --reflink=auto https://review.openstack.org/628998 | 15:31 |
dmsimard | btw, tagged ara 0.16.2 for release. No new features -- addresses warnings and a deprecation notice. | 15:34 |
*** e0ne has joined #openstack-infra | 15:42 | |
*** bobh_ has joined #openstack-infra | 15:43 | |
*** bobh_ has quit IRC | 15:47 | |
*** e0ne has quit IRC | 15:50 | |
openstackgerrit | Will Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix https://review.openstack.org/629006 | 15:56 |
smcginnis | dmsimard: Not sure if you saw, but the ara release jobs failed. Well, just the docs publishing. Looks like the readthedocs config isn't fully set up. | 15:57 |
dmsimard | smcginnis: Oh? I'll look -- thanks | 15:57 |
clarkb | rtd changed their api and now we cant remotely trigger updates | 15:58 |
clarkb | iirc | 15:58 |
dmsimard | clarkb: even with the new webhook stuff ? | 15:58 |
dmsimard | smcginnis: do you have a link handy ? | 15:58 |
smcginnis | Yeah, let me track that down. | 15:58 |
smcginnis | dmsimard: http://lists.openstack.org/pipermail/release-job-failures/2019-January/001015.html | 15:59 |
smcginnis | dmsimard: The logs don't actually have much useful info though - http://logs.openstack.org/a3/a31a4f8cbbc84f3d96efb0ffc533621190fdde46/release/trigger-readthedocs-webhook/d500e56/job-output.txt.gz#_2019-01-07_15_25_57_416880 | 16:00 |
clarkb | dmsimard: yes they broke it after the webhook stuff. ianw filed abug with them | 16:00 |
dmsimard | smcginnis: hmmm, I probably need to put the webhook_id elsewhere than https://github.com/openstack/ara/blob/master/zuul.d/layout.yaml#L3 | 16:00 |
dmsimard | clarkb: ok, I'll trigger it manually for now | 16:01 |
*** bnemec is now known as stackymcstackfac | 16:04 | |
*** stackymcstackfac is now known as bnemec | 16:05 | |
*** fuentess has quit IRC | 16:06 | |
*** smarcet has joined #openstack-infra | 16:12 | |
*** jamesmcarthur has quit IRC | 16:17 | |
*** jamesmcarthur_ has joined #openstack-infra | 16:17 | |
*** pcaruana has quit IRC | 16:21 | |
*** jamesmcarthur_ has quit IRC | 16:22 | |
*** jamesmcarthur has joined #openstack-infra | 16:22 | |
*** wolverineav has joined #openstack-infra | 16:24 | |
fungi | heading out to run some lunch errands, but will be back as soon as i can | 16:28 |
*** whoami-rajat has joined #openstack-infra | 16:28 | |
*** wolverineav has quit IRC | 16:29 | |
*** smarcet has quit IRC | 16:29 | |
*** psachin has quit IRC | 16:31 | |
*** smarcet has joined #openstack-infra | 16:32 | |
openstackgerrit | Will Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix https://review.openstack.org/629006 | 16:33 |
*** ramishra has quit IRC | 16:34 | |
*** armax has joined #openstack-infra | 16:39 | |
*** udesale has quit IRC | 16:40 | |
*** bobh_ has joined #openstack-infra | 16:41 | |
clarkb | ssbarnea|bkp2: would it be reasonable for your ansible use case to add a chmodprior to running ansible? or cloning the roles/playbooks to another location first? I'd really like to avoid breaking people in that legacy state by changing the expectations around that. We recognize there were bugs and deficiencies with that setup which is why we've replaced it entirely in zuulv3. | 16:42 |
clarkb | logan-: fwiw host_id: 704a6e4d2ae61ad0bf113de69b52cb6414dadb287241358ebaf1c7b2 shows up in a couple jobs that exhibit weird ipv4 connectivity between test nodes in limestone cloud. http://logs.openstack.org/31/628731/7/check/openstack-infra-multinode-integration-ubuntu-trusty/35c4982/zuul-info/inventory.yaml is one example with ovs vxlan tunnel over ipv4 not working and | 16:46 |
clarkb | http://logs.openstack.org/00/628200/1/gate/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/660080e/job-output.txt is a tripleo job unable to ssh from one node to the other for ansible over ipv4 | 16:46 |
clarkb | (still a msall data set so unfortunately don't have much more info than that) | 16:46 |
*** fuentess has joined #openstack-infra | 16:47 | |
clarkb | dmsimard: https://github.com/rtfd/readthedocs.org/issues/4986 is the rtd bug | 16:48 |
clarkb | still open but looks to be accepted and should be fixed in the future | 16:48 |
*** smarcet has quit IRC | 16:51 | |
*** ginopc has quit IRC | 16:53 | |
openstackgerrit | Will Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix https://review.openstack.org/629006 | 16:54 |
*** wolverineav has joined #openstack-infra | 16:54 | |
*** notmyname has quit IRC | 16:55 | |
openstackgerrit | Merged openstack-infra/system-config master: Turn on the future parser for all zuul mergers https://review.openstack.org/616295 | 16:56 |
*** wolverineav has quit IRC | 16:56 | |
*** notmyname has joined #openstack-infra | 16:57 | |
*** smarcet has joined #openstack-infra | 16:58 | |
*** rfolco has quit IRC | 16:58 | |
*** rfolco has joined #openstack-infra | 16:59 | |
*** rpittau has quit IRC | 17:00 | |
*** smarcet has quit IRC | 17:02 | |
*** aojea has quit IRC | 17:04 | |
openstackgerrit | Merged openstack-infra/zuul master: Fix ignored but tracked .keep file https://review.openstack.org/621391 | 17:06 |
openstackgerrit | Merged openstack-infra/system-config master: Turn on the future parser for zuul.openstack.org https://review.openstack.org/616296 | 17:06 |
clarkb | infra-root ^ I'll be watching that | 17:08 |
*** wolverineav has joined #openstack-infra | 17:08 | |
mordred | clarkb: ++ | 17:09 |
clarkb | fungi: the changes after that one are for lists.kata and lists.open, any chance you want to approve and/or babysit those with me today? | 17:09 |
*** dustinc has joined #openstack-infra | 17:11 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul-jobs master: [WIP] upload-pypi: add option to register packages https://review.openstack.org/629018 | 17:19 |
ssbarnea|bkp2 | clarkb: so not fixing broken zuul-cloner role because risks and because is supposed to go away. Still, I believe we do include it in 99% of jobs based on https://github.com/openstack-infra/project-config/blob/ab0cb430d130aaed3e6d333384c4d6d8740040fe/playbooks/base/pre.yaml#L38 -- fetch-zuul-cloner does more than fetching it, is also messing the src folder. | 17:20 |
ssbarnea|bkp2 | clarkb: can we make the role execution conditional? ... a step towards deprecation. | 17:21 |
ssbarnea|bkp2 | or better, to run the "umark" task only on old jobs. do we have a variable we can use to add a condition? | 17:21 |
clarkb | ssbarnea|bkp2: right I don't think we want to fix the frozen deprecated process. Instead we want to convert jobs to the new process. I think the plan for that was to make a different base job for legacy jobs. And the main base job wouldn't run the zuul cloner shim setup | 17:22 |
clarkb | ssbarnea|bkp2: but that process ran into problems because jobs were not marked legacy but had legacy dependencies. Probably what we can do is notify the dev list of the change happening then make the switch in a couple weeks | 17:23 |
clarkb | pabelanger: mordred ^ I think you had a lot more of thatp aged in than I did | 17:23 |
clarkb | corvus: http://paste.openstack.org/show/740460/ shows up on zuul node puppet runs. Can I just clean that up out of band to make the puppet logs quieter? | 17:25 |
clarkb | mordred: ^ you amy know too as it seems related to the zuul dashboard hoating | 17:25 |
*** trown is now known as trown|lunch | 17:26 | |
mordred | clarkb: uhoh. what did I do? | 17:26 |
corvus | yeah, i think that's very old status page | 17:26 |
ssbarnea|bkp2 | smells like chicken and the egg kind of issue. never fixing "base" because someone is/may be using it. How about having a "base2" base to use. At least this approach allow people to adopt newer base(s) without having to worry about some project being too slow. | 17:26 |
clarkb | ssbarnea|bkp2: yes I'm saying we should fix base, just give people some time to chagne their jobs if they need to first | 17:26 |
clarkb | ssbarnea|bkp2: basically the thing that has been missing is someone to drive and coordinate that work. There isn't a lack of wanting to do it | 17:27 |
mordred | clarkb: I agree - I think we shoudl fix base to not run fetch-zuul-cloner and have that only run in legacy-base | 17:27 |
ssbarnea|bkp2 | mordred: i like that idea. i was considering using a "when" confition on inclusion of this role. | 17:28 |
ssbarnea|bkp2 | import_role with when works ok, I only do not know what to check for (how to know which job is old/new) | 17:29 |
clarkb | ssbarnea|bkp2: I don't think we want to make it complicated like that. Instead rely on zuul's job inheritance to simplify it for us. Use base if your job is not legacy and legacy-base if it is | 17:29 |
clarkb | (I don't know if legacy-base exists yet) | 17:29 |
clarkb | base won't have the zuul cloner shim setup in it and legacy-base will | 17:30 |
clarkb | corvus: ok I'll clean those dirs up | 17:30 |
*** rf0lc0 has joined #openstack-infra | 17:30 | |
*** jpich has quit IRC | 17:31 | |
*** rf0lc0 has quit IRC | 17:31 | |
mordred | yes - legacy-base exists | 17:32 |
mordred | and it runs fetch-zuul-cloner | 17:32 |
*** rfolco has quit IRC | 17:33 | |
mordred | so I think we should be able to warn people, give it a little time, then remove fetch-zuul-cloner from base | 17:33 |
mordred | all of the autoconverted legacy jobs use legacy-base | 17:33 |
clarkb | ++ | 17:33 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/project-config master: WIP: attempt removal of fetch-zuul-cloner from base job https://review.openstack.org/629019 | 17:34 |
ssbarnea|bkp2 | mordred: clarkb ^^ so my WIP test above could be the future removal. I already crearted a DNM change that tests its effect on tripleo https://review.openstack.org/#/c/625680/ | 17:38 |
clarkb | ssbarnea|bkp2: I don't think the depends on will work because project-config changes must be merged first before they can be tested | 17:38 |
mordred | ssbarnea|bkp2: that unfortunately won't work ... | 17:38 |
mordred | yeah - what clarkb said | 17:38 |
clarkb | what we can do is troll logstash for zuul-cloner usage | 17:39 |
mordred | (this will get better with pabelanger's base job refactor, but that hasn't landed yet) | 17:39 |
clarkb | and cross check that with people using base and not legacy-base | 17:39 |
*** ykarel has quit IRC | 17:39 | |
clarkb | (I'm not sure how much work that is) | 17:39 |
mordred | clarkb: we could also work through pushing in pabelanger's base refactor so that we could test zuul-cloner removal with depends-on | 17:39 |
clarkb | zuul01 ran with futureparser and seems happy | 17:39 |
ssbarnea|bkp2 | i don't know how you can detect its usage, as the role runs as part of every hob. | 17:40 |
clarkb | ssbarnea|bkp2: you'd be looking for jobs that actually run the zuul-cloner command later | 17:40 |
clarkb | if they don't then they don't need the shim | 17:40 |
clarkb | mordred: ya | 17:40 |
mordred | clarkb: https://review.openstack.org/#/q/status:open+topic:base-minimal-jobs - I was going to work on landing that once done with the zuulv3-output topic | 17:40 |
*** rfolco has joined #openstack-infra | 17:42 | |
*** rkukura has joined #openstack-infra | 17:42 | |
*** gyee has joined #openstack-infra | 17:45 | |
*** dtantsur is now known as dtantsur|afk | 17:45 | |
AJaeger | mordred: feel free to ping if you need review for zuulv3-output, I suggest we wrap that up without waiting another 12 months ;) | 17:46 |
mordred | AJaeger: totally. It's my main priority job-wise at the moment | 17:47 |
jrosser | i've seen mirror errors like this a fair few times now http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/job-output.txt.gz#_2019-01-07_17_29_52_855806 | 17:47 |
clarkb | AJaeger: mordred is there much else to do after the base job switch? I guess convert a job or two and point people to that setup? | 17:48 |
*** wolverineav has quit IRC | 17:49 | |
clarkb | jrosser: http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/python-oslo-utils-lang-3.37.1-0.20181012100734.6e0b90b.el7.noarch.rpm exists but the 3.39 rpm your job requests does not. Those are caching proxies so you've tried to install a package that does not exist on the remote I think | 17:49 |
clarkb | jrosser: its possible we've got cache mismatches between indexes and actualy contents? except the index I see shows 3.37 and that rpm exists | 17:50 |
AJaeger | clarkb: https://review.openstack.org/#/q/topic:zuulv3-output+status:open is the list of open reviews - mordred is rebasing one after the other and merging them... | 17:50 |
ssbarnea|bkp2 | http://codesearch.openstack.org/?q=bin%2Fzuul-cloner&i=nope&files=&repos= reports ~444 occurences, which makes me hopless regarding reaching 0 in my lifetime. neither using logstash does not make me more optimistic | 17:50 |
mordred | clarkb: yeah - I think the next bit after that stack would be converting some of our main jobs - like devstack and build-sphinx | 17:50 |
openstackgerrit | Merged openstack-infra/puppet-ptgbot master: No longer needs room map in configuration https://review.openstack.org/625619 | 17:51 |
mordred | as examples - but also to get large portions of our system converted over | 17:51 |
mordred | I think we need to wait a bit before we can convert things like unittests base though | 17:51 |
mordred | we'll need to give deployers a deprecation period to get fetch-output into their base jobs | 17:52 |
clarkb | jrosser: https://trunk.rdoproject.org/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/ seems to only have 3.37 too. Any idea where 3.39 is coming from? | 17:52 |
mordred | but I thnik converting openstack-only base jobs should be straightforward | 17:52 |
clarkb | ssbarnea|bkp2: ya thats why I suggest we notify people via the list. Maybe include that codesearch link and a link to a logstash query | 17:52 |
clarkb | ssbarnea|bkp2: but let them fix it themselves | 17:52 |
clarkb | ssbarnea|bkp2: most of them should be using legacy-base if they came from the job conversion we did from jjb | 17:53 |
mordred | yah. I'd say using zuul-cloner and not using legacy-base should be an unsupported config | 17:54 |
ssbarnea|bkp2 | somehow i find the concept of full switch from any v1 to v2 as something very hard to achieve. Can't we find a more progressive adoption approach? maybe we can enable/disable features/changes using a versioned variable. "zuul_job_version", which is implicitly 1 but we could defined a bigger value in our jobs. Ans we can have tasks that run based on the value of this. | 17:56 |
jrosser | clarkb: i think that is a dependany of some other package, its easier to parse here but there are a bunch it can't find http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/logs/ara-report/result/f84f0c97-d4d7-416b-8972-b3abcaa08833/ | 17:56 |
clarkb | ssbarnea|bkp2: we crossed that bridge when zuul v3 decided to be incompatible with < 3 | 17:56 |
clarkb | ssbarnea|bkp2: I think for the future we should be more careful, but zuul-cloner is an artifact of zuul v2 which we do not run anymore and we should stop using it entirely | 17:57 |
clarkb | jrosser: I wonder if RDO is updating packages before putting all of the dependencies in place | 17:57 |
clarkb | jrosser: dmsimard may know | 17:57 |
openstackgerrit | Merged openstack-infra/puppet-ptgbot master: No longer build index page in puppet-ptgbot https://review.openstack.org/626911 | 17:57 |
AJaeger | http://zuul.openstack.org/status/change/628731,7 is waiting since 3 hours for Debian nodes ;( Any problems with Debian nodes ? | 17:58 |
dmsimard | clarkb, jrosser: having lunch right now, I'll be able to check in a few | 17:59 |
jrosser | no worries - i'm travelling home soon but i've seen a fair few of these so its worth a bit of a dig later | 17:59 |
clarkb | AJaeger: | 0001544620 | ovh-bhs1 | debian-stretch | a3145fd4-7ce5-4a5a-be55-6b2407f00cac | 158.69.65.196 | 2607:5300:201:2000::335 | ready | 00:02:24:55 | locked | | 18:00 |
clarkb | AJaeger: it is odd that a ready node would be locked for 2.5 hours | 18:00 |
*** ykarel has joined #openstack-infra | 18:00 | |
*** diablo_rojo has joined #openstack-infra | 18:00 | |
*** derekh has quit IRC | 18:00 | |
clarkb | | 0001547894 | rax-dfw | debian-stretch | 2f0bd8e3-135e-47d5-b28c-8cde74f3af85 | None | None | building | 00:00:00:17 | locked | is new node building and | 0001547866 | inap-mtl01 | debian-stretch | 15b690bd-0005-46c1-b47a-4047db6ed536 | 198.72.124.91 | | in-use | 00:00:00:49 | 18:00 |
clarkb | | locked | is recently used nodes | 18:00 |
clarkb | AJaeger: my guess is that locked ready node is tied to that job | 18:01 |
*** jcoufal has quit IRC | 18:01 | |
clarkb | Shrews: ^ is that something you might want to look at? | 18:01 |
logan- | interesting log on that first job you linked clarkb. it looks like it has connectivity across the vxlan but only in one direction? | 18:01 |
clarkb | logan-: no I think it is broken in both directions. We ping the local IP locally and remotely. The pings that succeed are for the local IP | 18:02 |
AJaeger | clarkb: thanks | 18:02 |
clarkb | logan-: that helps us to know if hte interface itself is broken or if it is the tunnel. The local IP pinging implies the tunnel is the issue | 18:03 |
*** jcoufal has joined #openstack-infra | 18:04 | |
ssbarnea|bkp2 | clarkb: regarding rdo, if i remember well updating deps doesn't happen before, due to its slow speed. but overall i think the logic changed in time. | 18:04 |
logan- | ah ok, so 172.24.4.1 is on the 'primary' side of the tunnel, and 172.24.4.2 is on secondary | 18:05 |
clarkb | logan-: yup | 18:05 |
*** diablo_rojo has quit IRC | 18:05 | |
Shrews | clarkb: i can look, but it's not uncommon for zuul to hold ready node locks for that long | 18:06 |
Shrews | will see if i can track that one down though | 18:06 |
*** ykarel has quit IRC | 18:06 | |
clarkb | Shrews: would it be waiting on an executor to be available to run the job? iirc we get nodes before an executor | 18:07 |
clarkb | basicaly what prevents that job from starting if it has a node and is queued. Executor availability is all I can think of right now | 18:08 |
clarkb | ssbarnea|bkp2: re cloner shim. Basically most jobs that use it will be carried over from our JJB conversion and use legacy-base. There is potential for some jobs to use the shim and parent to base and not legacy-base. If they do this they will break and we can fix them pretty easily by converting to legacy-base. So there might be a short period of brokeness, but comes with straightforward fix. If we | 18:11 |
clarkb | communicate that people can check things before hand (and fix it before hand) then I think we've done a good job there and anyone broken should have a relatively easy path forward for fixing too | 18:11 |
clarkb | ssbarnea|bkp2: if we want to wait on the base job refactoring that allows us to test more of the base jobs we can do that too which will make the step of auditing whether or not it will break you easier | 18:11 |
*** agopi has quit IRC | 18:14 | |
Shrews | clarkb: looks like ovh-bhs1 is at quota, so it's paused trying to handle the most recent request, but frequently getting launch errors. the active request queue is slowly shrinking, but node requests piled up in that queue are delayed (that node is part of one of the requests, but that one still needs 1 more node) | 18:15 |
*** agopi has joined #openstack-infra | 18:15 | |
clarkb | ah so this is multinode requests when at quota | 18:16 |
Shrews | clarkb: correct | 18:16 |
clarkb | amorin: ^ hello not sure if you are back from holidays, but we've noticed our quota in bhs1 is lower than we had previously | 18:17 |
clarkb | amorin: do we need to update our configs or is that a bug? | 18:17 |
*** rkukura has quit IRC | 18:18 | |
*** agopi has quit IRC | 18:18 | |
*** jamesmcarthur_ has joined #openstack-infra | 18:19 | |
*** agopi has joined #openstack-infra | 18:19 | |
*** jamesmcarthur has quit IRC | 18:19 | |
*** trown|lunch is now known as trown | 18:20 | |
Shrews | clarkb: i'm confused by that since we have max-servers as 150 but i count only 36 ovh-bhs1 nodes | 18:22 |
Shrews | something out of sync? | 18:22 |
clarkb | Shrews: yes our quota in bhs1 has been lowered | 18:22 |
clarkb | Shrews: don't know why or how yet, but basically nodepool is operating under the lower quota numbers | 18:22 |
clarkb | Shrews: part of it is we've kept frickler's test nodes around in that cloud so thats ~20 instances | 18:23 |
*** jamesmcarthur_ has quit IRC | 18:23 | |
clarkb | but that still doesn't account for the full decrease. | 18:23 |
*** jamesmcarthur has joined #openstack-infra | 18:23 | |
Shrews | hrm | 18:23 |
clarkb | frickler: ^ maybe we should clean those up now that it has been almost a month? | 18:23 |
*** jamesmcarthur has quit IRC | 18:24 | |
*** jamesmcarthur has joined #openstack-infra | 18:24 | |
*** jamesmcarthur has quit IRC | 18:24 | |
*** jamesmcarthur has joined #openstack-infra | 18:25 | |
*** jamesmcarthur has quit IRC | 18:26 | |
*** jamesmcarthur_ has joined #openstack-infra | 18:26 | |
*** agopi_ has joined #openstack-infra | 18:26 | |
*** jamesmcarthur_ has quit IRC | 18:27 | |
*** agopi has quit IRC | 18:29 | |
*** dkehn has joined #openstack-infra | 18:29 | |
*** jamesmcarthur has joined #openstack-infra | 18:32 | |
*** jamesmcarthur_ has joined #openstack-infra | 18:34 | |
*** jamesmcarthur has quit IRC | 18:34 | |
fungi | clarkb: sure, once i'm caught up, happy to keep an eye on puppeting of the listservers | 18:37 |
clarkb | fungi: great, want to approve the first one when you are in a good spot for that? I'll be around all day so can switch to that when you are ready | 18:37 |
*** jamesmcarthur_ has quit IRC | 18:40 | |
fungi | clarkb: "first one" being 628216? | 18:43 |
clarkb | fungi: yup | 18:43 |
fungi | i have it queued up in gertty now, will approve shortly, sure. thanks! | 18:44 |
*** rkukura has joined #openstack-infra | 18:45 | |
*** wolverineav has joined #openstack-infra | 18:48 | |
clarkb | I'll pop out for a bit now then should be back by the time it can be merged and applied | 18:50 |
*** wolverineav has quit IRC | 18:52 | |
dmsimard | infra-root: apache on mirror02.regionone.limestone.openstack.org is complaining from read-only filesystem when trying to write cache header files, ex: http://paste.openstack.org/raw/740463/ | 18:54 |
clarkb | dmsimard: any errors in dmesg or the kernel log indicating why the fs is ro? | 18:55 |
fungi | dmesg ring buffer has been spammed by filesystem errors | 18:55 |
dmsimard | clarkb: there are ext4-fs errors in dmesg but I'm trying to find when it started or if there are any afs errors | 18:56 |
dmsimard | also, yes, what fungi said -- actual non-fs errors have been cycled out | 18:56 |
clarkb | dmsimard: that cache isnt on afs it is "local" | 18:56 |
logan- | verified hv is not full /dev/mapper/SYSVG-ROOT 465G 251G 191G 57% / | 18:56 |
dmsimard | clarkb: oh, oops | 18:57 |
fungi | yeah, we have a lvm2 logical volume for it, from a vg on top of pv /dev/vdb1 | 18:57 |
dmsimard | I'm working my way up the apache logs, so far this has been ongoing since at least jan 2 | 18:58 |
dmsimard | apache logs go as far back as dec 31st and there was read only errors already | 18:59 |
fungi | syslog only has a one-week retention there, yeah | 18:59 |
dmsimard | [Mon Dec 31 06:35:17.172980 2018] [cache_disk:warn] [pid 2883:tid 140135773492992] (30)Read-only file system | 18:59 |
fungi | oldest syslog entry is: | 18:59 |
fungi | Dec 31 06:25:09 mirror02 kernel: [2073564.770768] EXT4-fs error (device dm-0): ext4_lookup:1606: inode #3367: comm updatedb.mlocat: deleted inode referenced: 8251 | 18:59 |
clarkb | fwiw I dont think that ie the source of jrosser's 404 as weseem to proxy without caching | 19:00 |
fungi | right, i thnik this would account for proxy performance degredation | 19:00 |
fungi | not 404 | 19:00 |
jrosser | It may be worth looking in logstash for similar because I have a hunch all of these I’ve seen were in limestone | 19:01 |
dmsimard | yes and no | 19:01 |
dmsimard | I found that issue as part of troubleshooting the 404 :) | 19:01 |
fungi | right, we ought to fix it regardless | 19:02 |
clarkb | ya shoudl be fixed | 19:02 |
dmsimard | mnaser pointed out that we may have been pulling a stale .repo file but it doesn't appear to be that way | 19:02 |
clarkb | dmsimard: no I checked directly and it seems to line up wuth our mirror | 19:02 |
clarkb | my hunch ia the rdo repo updating packages with new deps before adding the new deps | 19:03 |
dmsimard | I was in #openstack-ansible attempting to help, tl;dr is that OSA is setting up a recent repo (built today) which contains the packages that are 404'ing but for some reason yum is looking at this old 1 month old repo | 19:03 |
dmsimard | There is a 404 on http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/python2-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.noarch.rpm | 19:03 |
dmsimard | because that package is actually at https://trunk.rdoproject.org/centos7/90/44/9044841473d3c9a4c70882bfb5ca59f89cf7afa0_6dde040f/python-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.src.rpm | 19:04 |
fungi | looks like that pv is via cinder volume f18e717d-8981-4134-8fe1-57596f7481e4 | 19:04 |
dmsimard | or http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/90/44/9044841473d3c9a4c70882bfb5ca59f89cf7afa0_6dde040f/python-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.src.rpm | 19:04 |
*** tosky has quit IRC | 19:05 | |
fungi | logan-: possible we briefly lost contact with the cinder backend sometime >1week ago? that's usually sufficient for active volumes to go read-only on us | 19:05 |
dmsimard | that /90/44 repository is set up properly by OSA as far as we can tell http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/logs/ara-report/result/fd68c556-12ed-4096-9d07-697951a4b3cf/ | 19:05 |
dmsimard | I'm not exactly sure what is going on, would like to address the caching issue and see if we can reproduce | 19:06 |
fungi | if i stop apache2 and openafs services on mirror02.regionone.limestone i should be able to remount those filesystems read-write, but it probably makes more sense to reboot the instance anyway | 19:07 |
fungi | infra-root: objections to an emergency reboot of mirror02.regionone.limestone? ^ | 19:07 |
logan- | fungi: maybe when we rebooted hosts to update the kernel for nested virt fixes ceph hung io long enough to kick it into ro.. i can't remember exactly when that was, a couple weeks ago though | 19:07 |
*** smarcet has joined #openstack-infra | 19:07 | |
fungi | logan-: that could easily be it. we don't have syslog back that far unfortunately to get a more exact timestamp | 19:08 |
dmsimard | fungi: I think we'd want to do a stop/start to ensure we're on a new process with a brand new volume connection | 19:08 |
clarkb | fungi considering it broken anyeay seems fine. We can disable in nodepool too if we want to be more graceful about it | 19:08 |
fungi | dmsimard: yeah, rebooting the instance will certainly stop and start the processes running in it ;) | 19:08 |
clarkb | dmsimard: can we stop start as users of nova? | 19:08 |
dmsimard | fungi: I mean the KVM process | 19:08 |
clarkb | oh I read it as stop start the qemu process | 19:08 |
fungi | ohhh | 19:08 |
pabelanger | clarkb: mordred: https://review.openstack.org/513506/ removed zuul-cloner from base, I think the issue there was legacy tox jobs there depend on it still. Maybe we just reparent them to legacy base? | 19:09 |
pabelanger | clarkb: mordred: but agree, a heads up to ML about fallout might be a good idea | 19:09 |
fungi | yeah, i can `sudo poweroff` in the instance and then `openstack server boot` it fresh | 19:09 |
dmsimard | I'm not sure to what extent it applies to ceph, I remember iscsi issues that could only be resolved by spinning up a new process and a soft reboot wasn't enough | 19:09 |
fungi | though when we've seen this sort of thing in other clouds in the past, just remounting the filesystems read-write after a good fsck is usually sufficient | 19:10 |
*** wolverineav has joined #openstack-infra | 19:10 | |
*** wolverineav has quit IRC | 19:10 | |
*** wolverineav has joined #openstack-infra | 19:10 | |
*** jamesmcarthur has joined #openstack-infra | 19:11 | |
dmsimard | I suppose we should first attempt to remount before considering rebooting | 19:11 |
fungi | well, the outage from a reboot should be brief | 19:15 |
fungi | but we can dial down max-servers first if we want | 19:15 |
clarkb | that is the safest way | 19:17 |
clarkb | but I agree should be short and if we think jobs are broken anyway... | 19:17 |
*** jamesmcarthur has quit IRC | 19:18 | |
clarkb | ok I have to pop out for a few minutes. back in a few | 19:18 |
dmsimard | fungi: any of these solutions sound good to me -- I can send a patch for max-servers or we can put nl02 in the emergency file temporarily | 19:22 |
*** shardy has quit IRC | 19:24 | |
*** shardy has joined #openstack-infra | 19:25 | |
fungi | probably simplest to just do nl02 in emergency and then manually update max-servers on it, then wait for the used count to empty, then we can poweroff the mirror instance and boot it agani | 19:30 |
dmsimard | Was going to add it in the emergency file but saw a .swp file from a minute ago :p | 19:31 |
fungi | that was me | 19:31 |
dmsimard | ok | 19:31 |
*** gfidente has quit IRC | 19:32 | |
fungi | #status log temporarily lowered max-servers to 0 in limestone-regionone in preparation for a mirror instance reboot to clear a cinder volume issue | 19:32 |
openstackstatus | fungi: finished logging | 19:32 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo https://review.openstack.org/629035 | 19:32 |
fungi | i'll keep an eye on that to make sure the current puppet.ansible pulse doesn't re-up it | 19:32 |
fungi | that was the only provider on nl02 booting nodes, btw. the others (citycloud, packethost) were already set to max-servers 0 | 19:34 |
*** agopi__ has joined #openstack-infra | 19:34 | |
*** smarcet has quit IRC | 19:36 | |
*** agopi_ has quit IRC | 19:36 | |
clarkb | pabelanger: ^ re packethost have you been able to follow up on using some osa there? | 19:36 |
clarkb | I'm back now too | 19:37 |
pabelanger | clarkb: Yah, they are keen. We are just finishing up the POC with ansible-network. | 19:37 |
clarkb | NICE | 19:37 |
clarkb | er didn't mean the caps there but still nice :) | 19:37 |
pabelanger | clarkb: I think we actually can start pushing up code to review.o.o next week, and start deploying it from zuul | 19:38 |
cloudnull | ^ nice | 19:38 |
*** smarcet has joined #openstack-infra | 19:38 | |
pabelanger | but yah, OSA works well on stable/rocky | 19:38 |
smarcet | fungi: mordred:clarkb: please review https://review.openstack.org/#/c/629035/ thx | 19:40 |
dmsimard | #status log mirror02.regionone.limestone.openstack.org's filesystem on the additional cinder volume went read only for >1 week (total duration unknown) causing errors when apache was attempting to update it's cache files. | 19:41 |
openstackstatus | dmsimard: finished logging | 19:41 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Add fetch-output and ensure-output-dirs tests https://review.openstack.org/628731 | 19:43 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Use is instead of | for tests https://review.openstack.org/628973 | 19:44 |
clarkb | mordred: ^ fyi | 19:44 |
mordred | clarkb: woot | 19:44 |
mordred | smarcet: zuul is sad about that patch | 19:45 |
smarcet | yes saw it | 19:45 |
smarcet | mordred: will fix sorry about that | 19:45 |
mordred | smarcet: no worries! | 19:45 |
*** jamesmcarthur has joined #openstack-infra | 19:47 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo https://review.openstack.org/629035 | 19:52 |
*** whoami-rajat has quit IRC | 19:58 | |
*** kjackal has quit IRC | 20:04 | |
smarcet | mordred:fungi: fixed https://review.openstack.org/#/c/629035/ | 20:07 |
AJaeger | mordred: want to abandon https://review.openstack.org/628668 now? | 20:08 |
*** agopi__ has quit IRC | 20:09 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo https://review.openstack.org/629035 | 20:21 |
*** bobh_ has quit IRC | 20:21 | |
*** imacdonn has joined #openstack-infra | 20:22 | |
clarkb | fungi: should I approve https://review.openstack.org/#/c/628216/4 or were you still planning to do it? | 20:27 |
fungi | i've approved it just now and am watching syslog on the server | 20:29 |
clarkb | great | 20:30 |
fungi | last puppet apply there was 19:58:40 | 20:30 |
*** e0ne has joined #openstack-infra | 20:40 | |
imacdonn | hi guys ... https://review.openstack.org/#/c/612393/ failed in the gate due to a tempest timeout ... must it be rechecked, or is there any shortcut (requeue?) | 20:41 |
clarkb | imacdonn: in general failures have to be rechecked due to the "clean check" requirement | 20:44 |
clarkb | the major exception to this is when we are trying to get bug fixes in for the gate itself and want to expedite that process to avoid unnecessary gate resets | 20:44 |
clarkb | (this is why I keep pushing for people to help debug and fix gate errors) | 20:44 |
imacdonn | clarkb: yeah, I understand that, but it seems like a timeout has a high probability of not being the code's fault ... I think exceptions may have been given in the past, but I don't recall the circumstances | 20:45 |
imacdonn | just seems like a waste of resources to go through check again | 20:45 |
imacdonn | understood, though ... figured it was working asking ;) | 20:46 |
clarkb | imacdonn: what we are trying to avoid there is making it easy for flaky code to go through the gate and merge then be flaky for everyone (we've seen this happen in the past and is one of the reasons for clean check. The other is ensuring that we have relatively up to date results avoiding unnecessary gating) | 20:46 |
clarkb | imacdonn: and yes the flaky gate failures are often not directly related to the specific change that failed. | 20:47 |
clarkb | whcih is why I keep pushing people to identify the failures, track them with elastic rehckec and ideally fix them if we are able | 20:47 |
imacdonn | clarkb: yeah, I get it .... trying to diagnose timeouts on infra that you have little visibility into can be challenging, though ;) | 20:48 |
clarkb | imacdonn: ya one of the frequent steps we have to take is add additional logging around the unhappy code | 20:49 |
fungi | corvus: noticing /var/lib/bind/zones/zuulci.org/zone.db.signed on adns1.opendev.org is (much) older than zone.db, and rndc zonestatus doesn't list a "next resign node" or "next resign time" for it (also says "secure: no"). still digging to see why it's not getting new sigs | 20:51 |
fungi | oh! /etc/bind/keys/zuul-ci.org is empty | 20:52 |
fungi | that could be related ;) | 20:52 |
openstackgerrit | Merged openstack-infra/system-config master: Fix glob for lists.katacontainers.io https://review.openstack.org/628216 | 20:52 |
fungi | same on adns1.openstack.org as well | 20:53 |
fungi | i wonder how we got a signed zone for it to begin with | 20:53 |
fungi | er, meant to say /etc/bind/keys/zuulci.org is empty | 20:54 |
*** bobh_ has joined #openstack-infra | 20:54 | |
*** jamesmcarthur has quit IRC | 20:54 | |
fungi | /etc/bind/keys/zuul-ci.org (with the hyphen) definitely has content | 20:55 |
*** smarcet has quit IRC | 20:58 | |
fungi | looks like we're missing a section for the zuulci.org zone in /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o | 20:58 |
*** bobh_ has quit IRC | 21:00 | |
fungi | working on getting some added now | 21:02 |
corvus | fungi: since it was registered through csc, it was probably never really signed | 21:05 |
corvus | (cause i think opendev.org was the first csc domain we dnssec'd) | 21:05 |
fungi | makes sense | 21:06 |
fungi | strangely, it has a zone.db.signed file anyway | 21:06 |
*** smarcet has joined #openstack-infra | 21:07 | |
clarkb | corvus: before I forget want to followup with the thread about k8s walkthrough with a time selection? Probably want to do that today so people can make time tomorrow if that is whn we are doing it | 21:07 |
corvus | clarkb: will do now | 21:07 |
dmsimard | fungi: need to take off and I'll be back later, it looks like limestone is clear: http://grafana.openstack.org/d/WFOSH5Siz/nodepool-limestone?orgId=1 | 21:08 |
fungi | dmsimard: yep, i'll get it rebooted thoroughly here in a but | 21:09 |
fungi | er, in a bit | 21:09 |
*** xek has quit IRC | 21:11 | |
fungi | #status log generated and added dnssec keys for zuulci.org to /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o | 21:15 |
openstackstatus | fungi: finished logging | 21:15 |
fungi | hopefully that'll get things rolling | 21:16 |
*** smarcet has quit IRC | 21:19 | |
clarkb | fungi: do we need to give the registrar soem of that info? | 21:20 |
clarkb | seems like we had to do that for opendev | 21:20 |
clarkb | fungi: also lists.kata should be puppeting in the next 10-15 minutes I thin | 21:21 |
clarkb | (I've got a shell there now too)( | 21:21 |
*** smarcet has joined #openstack-infra | 21:21 | |
fungi | yeah, the last pulse was a no-op but i think it started before the change merged | 21:22 |
fungi | clarkb: we likely need to provide ds records to csc for zuulci.org once everything is confirmed working | 21:22 |
fungi | but i'd hold off doing that until we see the serial update | 21:23 |
*** wolverineav has quit IRC | 21:26 | |
*** kgiusti has left #openstack-infra | 21:27 | |
clarkb | fungi: and that just affects abuility to verify the signed zone? | 21:30 |
fungi | right | 21:30 |
clarkb | fungi: lists.kata lgtm | 21:30 |
fungi | clarkb: yeah, other than all the deprecation messages it looks to have been a no-op? | 21:31 |
clarkb | yup | 21:31 |
fungi | i'll go ahead and approve the next change now | 21:31 |
clarkb | ++ | 21:31 |
fungi | and watch lists.o.o accordingly | 21:31 |
*** jcoufal has quit IRC | 21:31 | |
clarkb | fungi: and so I understand the dns thing better, csc wasn't syncing the zone because it wasn't properly signed? | 21:36 |
fungi | had nothing to do with csc | 21:36 |
clarkb | oh ns1 and ns2 weren't syncing it from adns1? | 21:36 |
fungi | ns1/ns2.opendev.org were serving old copies of the zone because that's what adns1.opendev.org was providing them | 21:36 |
fungi | adns1.opendev.org had a zone.db.signed file (presumably copied from adns1.openstack.org?) corresponding to before the ns record changes in the current zone.db file | 21:37 |
fungi | and zone.db.signed is what was getting served | 21:38 |
clarkb | got it | 21:38 |
fungi | if it had been updating zone.db.signed on each new zone.db change, that would have worked out fine | 21:38 |
*** jamesmcarthur has joined #openstack-infra | 21:39 | |
fungi | but since it had no dnssec keys for the zuulci.org zone, it wasn't able to sign the newer version of the zone so kept serving the old signed one | 21:39 |
fungi | the fix (i'm hoping!) was to run the dnssec-keygen commands from https://docs.openstack.org/infra/system-config/dns.html#adding-a-zone and then insert their contents into /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o | 21:41 |
fungi | if not, at least i'll get to see what else is missing next | 21:41 |
fungi | but yeah, if zone.db.signed gets updated here in a little while, then we can run the dnssec-dsfromkey command there and provide the output to csc | 21:42 |
clarkb | imacdonn: digging itno that failure it seems the main tempest run was fine http://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/job-output.txt.gz#_2019-01-07_18_28_17_260583 then the slowtests run gets really unhappy for about 35 minutes and the job times out | 21:43 |
clarkb | fungi: got it | 21:43 |
fungi | infra-root: i've powered off mirror02.regionone.limestone now that there are no jobs running in that region. i'll get it booted back up and checked out here in a few | 21:43 |
*** e0ne has quit IRC | 21:44 | |
*** smarcet has quit IRC | 21:44 | |
imacdonn | clarkb: I guess they weren't kidding when they marked them as "slow" :) | 21:44 |
clarkb | imacdonn: dstat shows the node goes relatively idle after the first tempest run too | 21:46 |
clarkb | I think that rules out a memory or cpu or disk issue | 21:46 |
clarkb | imacdonn: http://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/controller/logs/screen-n-cpu.txt.gz#_Jan_07_18_33_34_967897 shows us virtual interface creation failed (and it had ~5 minutes to do that) | 21:50 |
openstackgerrit | Merged openstack-infra/system-config master: Fix glob for lists.o.o https://review.openstack.org/628217 | 21:51 |
clarkb | http://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/controller/logs/screen-q-agt.txt.gz#_Jan_07_18_28_36_447492 neutron seems to think the device was created properly | 21:52 |
*** auristor has quit IRC | 21:53 | |
fungi | wow, neat, mirror02.regionone.limestone is refusing to let me ssh in now, and it's been at least 5 minutes since i issued the server start command. guess i'll check the console | 21:54 |
fungi | lovely, it can't fsck /dev/main/mirror02-cache-apache2 so has dropped to single-user | 21:56 |
imacdonn | clarkb: hmm... message queue issue ? | 21:57 |
logan- | fungi: :( lmk if i can be of any assistance on my end | 21:58 |
clarkb | imacdonn: possibly. Looking at rabbit logs there are some unexpected disconnects but none from the nova-compute pids | 21:58 |
clarkb | imacdonn: do those events go through nova conductor instead? | 21:58 |
*** bnemec has quit IRC | 21:58 | |
clarkb | hrm the disconnects all seem to be uwsgi processes so not conductor either | 21:59 |
clarkb | imacdonn: probably need nova and/or neutron to dig into that more. Maybe dansmith or slaweq can help | 22:00 |
clarkb | http://status.openstack.org/elastic-recheck/#1808171 may be related as well | 22:01 |
*** trown is now known as trown|outtypewww | 22:01 | |
*** wolverineav has joined #openstack-infra | 22:01 | |
*** wolverineav has quit IRC | 22:01 | |
*** wolverineav has joined #openstack-infra | 22:01 | |
fungi | logan-: thanks! just getting pulled in lots of directions so taking me a few minutes to dig up credentials | 22:02 |
clarkb | imacdonn: ya that logstash signature matches except for the test name. So we may want to broaden that query | 22:02 |
*** bnemec has joined #openstack-infra | 22:02 | |
*** e0ne has joined #openstack-infra | 22:02 | |
clarkb | ya lots of hits if I remove the test filter | 22:03 |
fungi | logan-: aha, `console url show ...` did the trick. hooray for people running *actual* openstack! | 22:03 |
clarkb | imacdonn: I'll go bug neutron | 22:03 |
*** auristor has joined #openstack-infra | 22:06 | |
imacdonn | clarkb: thanks! I'll watch there | 22:06 |
fungi | yay! i can ssh into mirror02.regionone.limestone now, after manually rerunning fsck with -y via a root shell on the oob console | 22:08 |
*** rh-jelabarre has quit IRC | 22:08 | |
logan- | great | 22:08 |
clarkb | fungi: was it refusing to complete the boot without a fsck? | 22:08 |
fungi | depends on your definition of "refusing," "complete" and "boot" i guess ;) | 22:08 |
fungi | i happily complained that one of the filesystems in /etc/fstab had errors, and then helpfully dropped to a root shell in single-user mode | 22:09 |
fungi | er, it happily complainec | 22:09 |
fungi | something | 22:10 |
fungi | it's getting to that time of day where my typing is even more atrocious than usual | 22:10 |
clarkb | ah | 22:11 |
*** slaweq has quit IRC | 22:12 | |
clarkb | fungi: have we reenabled that region in nodepool yet? | 22:22 |
fungi | not yet | 22:24 |
fungi | i'm in a bunch of conversations, trying to finish checking the mirror out | 22:24 |
fungi | apache and afs caches look sane, hitting from a browser | 22:26 |
*** ianw_pto is now known as ianw | 22:26 | |
*** boden has quit IRC | 22:26 | |
fungi | and the filesystems are mounted and no errors are being reported | 22:26 |
ianw | HNY everyone | 22:26 |
fungi | half-normal yodelling to you too! | 22:27 |
clarkb | ianw: hello | 22:28 |
mordred | hello ianw! | 22:29 |
*** slaweq has joined #openstack-infra | 22:30 | |
*** e0ne has quit IRC | 22:32 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Update upload-logs to process docs as well https://review.openstack.org/511853 | 22:32 |
*** slaweq has quit IRC | 22:34 | |
fungi | #status log nl02 has been removed from the emergency maintenance list now that the filesystems on mirror02.regionone.limestone have been repaired and checked out | 22:41 |
openstackstatus | fungi: finished logging | 22:41 |
*** diablo_rojo has joined #openstack-infra | 22:43 | |
*** rcernin has joined #openstack-infra | 22:45 | |
*** tosky has joined #openstack-infra | 22:52 | |
manjeets | clarkb, hi, adding success-comment in pipeline.yaml didn't make it go to CI section | 22:54 |
clarkb | manjeets: I'm not sure then | 22:54 |
fungi | manjeets: can you link to an example review where your ci system added a comment? | 22:55 |
*** eernst has joined #openstack-infra | 22:55 | |
manjeets | fungi, https://review.openstack.org/#/c/603501/ | 22:56 |
manjeets | Intel SriovTaas CI check comments from | 22:56 |
manjeets | comments from Intel SriovTaas CI check | 22:56 |
openstackgerrit | Merged openstack-infra/zuul master: Add timer for starting_builds https://review.openstack.org/623468 | 22:57 |
*** eernst_ has joined #openstack-infra | 22:57 | |
*** eernst has quit IRC | 22:57 | |
*** tmorin has joined #openstack-infra | 22:57 | |
fungi | manjeets: thanks, i think the account display name needs to be adjusted to just "Intel SriovTaas CI" without the "check" on the end of the account name | 22:57 |
manjeets | fungi, i'll try that too thanks !! | 22:58 |
tmorin | frickler: you can now release the CI node you had frozen earlier today to let me debug, I gathered enough information to explore a path to a solution -- many thanks! | 22:58 |
*** eernst_ has quit IRC | 22:58 | |
*** eernst has joined #openstack-infra | 22:59 | |
fungi | manjeets: you can see the regular expression used to match on the display names for comments at https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/files/gerrit/hideci.js#n19 | 22:59 |
fungi | var ciRegex = /^(.* CI|Jenkins|Zuul)$/; | 22:59 |
fungi | so if there's something after the "CI" in the name, that will cause it not to match | 22:59 |
fungi | clarkb: puppet is running on lists.o.o now | 23:00 |
fungi | and just finished | 23:00 |
tmorin | ( frickler, or anyone else acting as infra-root, the CI node that can be freed is was ubuntu-xenial-inap-mtl01-0001542013 ) | 23:00 |
manjeets | fungi I wonder how's this working on here https://review.openstack.org/#/c/629041/ | 23:00 |
fungi | clarkb: looks like it was (properly) a no-op? | 23:00 |
manjeets | there's Intel NFV CI check | 23:00 |
fungi | manjeets: "check" is the pipeline name. if you hit the "toggle ci" button at the bottom of the page you'll see the display name for that account is just "Intel NFV CI" with no "check" after it | 23:01 |
*** eernst has quit IRC | 23:02 | |
fungi | the "check" part is taken from the job report string, where it says "Build succeeded (check pipeline)." | 23:02 |
manjeets | got it thanks ! fungi for some reason I followed that thinking check is part of name | 23:02 |
manjeets | my bad | 23:02 |
manjeets | fungi, cool that worked https://review.openstack.org/#/c/603501/33 | 23:02 |
clarkb | fungi: yup looks like a proper noop. The next services in the list are openstackid. Any idea if we are puppeting those currently? <- smarcet may know too | 23:02 |
manjeets | thanks ! | 23:03 |
imacdonn | clarkb: argh, my recheck failed on that thing that looks like an address conflict (ssh timeout)! can't win! http://logs.openstack.org/93/612393/21/check/cinder-tempest-dsvm-lvm-lio-barbican/ebc3a73/ | 23:03 |
fungi | clarkb: we are not (currently) puppeting openstackid.org production, while smarcet works through updating openstackid to newer php on openstackid-dev | 23:03 |
fungi | manjeets: great! happy you got it worked out | 23:04 |
clarkb | fungi: should we go ahead and flip openstackid-dev to futureparser now? | 23:04 |
clarkb | then we can flip the switch for prod too and it should work if -dev is happy | 23:04 |
fungi | clarkb: we might want to double-check that it won't complicate what smarcet is doing on openstackid-dev now. also i think he wants to couple this with server rebuilds on xenial (they're still trusty) | 23:05 |
clarkb | fungi: ok, I'm happy either way. We managed to get through a whole chunk of services onto futureparser and we can rebase the list order if necessary from this point forward | 23:06 |
clarkb | I expect its only a small handful of services now | 23:06 |
*** tmorin has quit IRC | 23:06 | |
fungi | yeah, might make sense to bump those further up the list or something | 23:06 |
fungi | also http://grafana.openstack.org/d/WFOSH5Siz/nodepool-limestone says 50 nodes in use again, so ansible/puppet has restored the old max-servers | 23:07 |
fungi | if tmorin comes back, that was one of at least 3 nodes held with the same comment, so i'm not sure whether he's done with all of them or just that one | 23:10 |
*** rascasoft has quit IRC | 23:10 | |
fungi | mnaser: are you done with the magnum-kubernetes-conformance troubleshooting for that last pair of nodes we held a week ago? | 23:10 |
*** smarcet has joined #openstack-infra | 23:13 | |
*** diablo_rojo has quit IRC | 23:14 | |
openstackgerrit | Luigi Toscano proposed openstack-infra/project-config master: Basic job and queue definitions for sahara-plugin-* https://review.openstack.org/629068 | 23:14 |
fungi | corvus: clarkb: ansible added the keys under /etc/bind/keys/zuulci.org and bind seemed to be aware of them, but didn't update /var/lib/bind/zones/zuulci.org/zone.db.signed until i issued a `sudo rndc loadkeys zuulci.org` | 23:19 |
fungi | though it's still serving that same older soa with the 1526407320 serial from may | 23:21 |
*** ianychoi has quit IRC | 23:21 | |
fungi | so all it seems to have done is refreshed the signature on the old zone content? | 23:22 |
fungi | i wonder if we should clear out the contents of /var/lib/bind/zones/zuulci.org and let the signatures get recreated fresh | 23:23 |
fungi | is anybody else having trouble pulling up https://etherpad.openstack.org/ right now? | 23:24 |
openstackgerrit | Merged openstack-infra/openstackid-resources master: Migration to PHP 7.x https://review.openstack.org/616226 | 23:24 |
fungi | i can't ssh to it at all | 23:24 |
fungi | via ipv6 or ipv4 | 23:25 |
fungi | i wonder if the host just crashed out from under it | 23:25 |
clarkb | ssh not working via ipv4 from here | 23:25 |
clarkb | it does ping, console might say something interesting? | 23:25 |
fungi | jumping into rackspace dashboard, yeah | 23:26 |
fungi | "Loading Console ..." | 23:28 |
fungi | is all it gives me | 23:28 |
fungi | expecting an e-mail from fanatical support to infra-root@o.o in 3... 2... 1... | 23:28 |
fungi | cacti says load average and iowait went through the roof just before it went dead for us | 23:35 |
fungi | i wanted to check whether this was a good opportunity to rebuild it on xenial while it's offline anyway, so i went to pull up the etherpad where we had the list of remaining servers to upgrade... :/ | 23:39 |
fungi | anyway, etherpad-dev seems to already be on xenial so i suspect etherpad.o.o is as well | 23:39 |
clarkb | yup | 23:40 |
clarkb | afs* kdc* groups* health status lists.* openstackid* ask graphite pbx refstack static and wiki-dev are the remaining servers | 23:41 |
clarkb | we also need to rm puppetmaster at some point (its still trusty but is replaced with bridge which is bionic) | 23:41 |
clarkb | fungi: ^ that is from my cached copy of the etherpad | 23:41 |
fungi | cacti is still reporting values for snmp polls in the past few minutes, so i was about to say maybe the host is up... | 23:42 |
fungi | "This message is to inform you that the host your cloud server 'etherpad01.openstack.org' resides on alerted our monitoring systems at 23:41 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding what is causing the alert. Please do not access or modify 'etherpad01.openstack.org' during this process. Please reference this incident ID if | 23:43 |
fungi | you need to contact support: CSHD-9wZ2KeoQVvD" | 23:43 |
fungi | #status notice The Etherpad service at https://etherpad.openstack.org/ has been offline since 23:22 UTC due to a hypervisor issue in our service provider, but should hopefully return to service shortly. | 23:47 |
openstackstatus | fungi: sending notice | 23:47 |
*** tosky has quit IRC | 23:48 | |
-openstackstatus- NOTICE: The Etherpad service at https://etherpad.openstack.org/ has been offline since 23:22 UTC due to a hypervisor issue in our service provider, but should hopefully return to service shortly. | 23:49 | |
corvus | mordred: regardless of whether we think it's ready for gitea; i bet we could do an HA etherpad/percona in k8s. | 23:50 |
openstackstatus | fungi: finished sending notice | 23:51 |
fungi | that would be neat | 23:51 |
clarkb | corvus: the one gotcha there is only one nodejs process can serve all clients for a single pad | 23:51 |
clarkb | corvus: so you have to have some fairely intelligent load balancing happening | 23:52 |
mordred | corvus: ++ | 23:52 |
fungi | high-availability doesn't necessarily imply active/active | 23:52 |
clarkb | fungi: ya active standy would be simplest way to do it probably | 23:52 |
fungi | we could get away with active/standby probably (with some data loss at failover) | 23:52 |
mordred | fungi: yah. just being able to restart a new process quickly on a different backend node as things go south would be a nice win | 23:52 |
corvus | yeah, i guess we could have just one etherpad pod which gets auto-rescheduled, or we could probably have a stateful LB. | 23:52 |
mordred | yah. could do both | 23:53 |
corvus | so an active-active percona system with a single re-schedulable etherpad server. piece of cake. | 23:53 |
fungi | looks like the server is back up and responding to ssh again | 23:53 |
mordred | yup | 23:53 |
fungi | 23:53:51 up 1 min, 1 user, load average: 0.18, 0.06, 0.01 | 23:53 |
corvus | (and by piece of cake, i mean "mordred did all that percona work already" :) | 23:54 |
fungi | heh | 23:54 |
fungi | when you say "percona" you're referring to "Percona Server for MySQL"? | 23:55 |
*** dave-mccowan has quit IRC | 23:56 | |
clarkb | I'm guessing galera | 23:56 |
fungi | a la https://github.com/percona/percona-server | 23:56 |
*** jamesmcarthur has quit IRC | 23:56 | |
clarkb | whcih does active active active mysql | 23:56 |
mordred | percona xtradb cluster | 23:59 |
mordred | https://review.openstack.org/#/c/626054/ | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!