openstackgerrit | Ian Wienand proposed opendev/system-config master: Add mirror-update to run_all.sh https://review.opendev.org/670927 | 00:04 |
---|---|---|
*** goldyfruit has joined #openstack-infra | 00:07 | |
ianw | has anyone already debugged why CI jobs seem to want to run the cloud_launcher and access the clouds? e.g. -> http://logs.openstack.org/06/669006/1/check/system-config-run-mirror/0cbec98/bridge.openstack.org/ara-report/reports/6d295f4d-4aa0-4eb7-a6b8-a7fcd6e4c3da.html | 00:12 |
ianw | cloud-launcher : Processing keypair infra-root-keys for openstackci-ovh BHS1 | 00:12 |
*** weifan has quit IRC | 00:13 | |
*** weifan has joined #openstack-infra | 00:13 | |
ianw | also we seem to have lost the "toggle ci button" ... i think something went in around that? | 00:14 |
ianw | Loading failed for the <script> with source “https://review.opendev.org/static/hideci.js?e=31300f2c4db937f32384feb237fc356b” | 00:15 |
ianw | oh, we have to touch the file or something ... | 00:15 |
corvus | ianw: run_cloud_launcher happens via cron (if the timing of the job is just right). it's not actually part of the job, and even if it fails, zuul doesn't know about it. it just shows up in the inner ara report because that comes from the host. | 00:18 |
*** weifan has quit IRC | 00:18 | |
ianw | corvus: ahhh. ok that explains it ... :) | 00:18 |
corvus | ianw: you should be able to ignore it, though it is annoying. we did something in the job to disable the run_all cron because it was racing the tested version of playbooks and that's no good. we may be able to do the same for the cloud launcher | 00:19 |
ianw | #status log touched /home/gerrit2/review_site/etc/GerritSiteHeader.html after merge of Id0cd8429ee5ce914aebbbc4a24bef9ebf675e21c | 00:19 |
openstackstatus | ianw: finished logging | 00:19 |
ianw | so on the other thing, "Toggle Extra CI" is back | 00:20 |
ianw | ... but doesn't actually toggle anything for me :/ | 00:20 |
fungi | it should only toggle job results from not zuul now | 00:24 |
fungi | so third-party ci | 00:25 |
ianw | also the page is horizontal scrolling for me now too | 00:25 |
fungi | huh | 00:25 |
ianw | fungi: hrm, i'm not seeing zuul results in the list | 00:25 |
ianw | of comments | 00:25 |
fungi | if i go to, say, https://review.opendev.org/665723 it works for me | 00:26 |
fungi | oh, wait, that's just the merged message | 00:29 |
clarkb | it toally worked in chrome for me when I tested it :/ | 00:29 |
fungi | it looks like it's hiding the zuul vote comments entirely. did i misread the conditional in that script? | 00:30 |
ianw | yeah, i'm seeing something like https://imgur.com/a/3drQ7mW | 00:30 |
fungi | i think the conditional i think i see the problem | 00:31 |
fungi | patch on the way | 00:31 |
*** aaronsheffield has quit IRC | 00:33 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Correct hide logic for Zuul CI comments in Gerrit https://review.opendev.org/670928 | 00:33 |
fungi | ianw: clarkb: ^ | 00:33 |
fungi | bull in a china shop, as i said ;) | 00:33 |
fungi | i had inadvertently applied it to the conditional for whether or not to display the comment, not the one for whether or not to hide it | 00:34 |
fungi | so it was unconditionally hidden | 00:34 |
fungi | clarkb: is it possible you made the same mistake i did and saw a zuul "change successfully merged" comment and thought all was good? | 00:35 |
*** weifan has joined #openstack-infra | 00:35 | |
*** ijw has quit IRC | 00:36 | |
clarkb | maybe? I thought I sawmore than one but maybe I updated the wrong conditional when I did it manually or something? | 00:37 |
*** weifan has quit IRC | 00:37 | |
ianw | i'm shift-reloading https://review.opendev.org/#/c/665723/ and do see all the comments | 00:37 |
ianw | along with "extra ci" button, so that code is sort of active | 00:38 |
clarkb | maybe we need the check in both conditionals? | 00:38 |
fungi | no, that's me trying out that change on gerrit | 00:38 |
fungi | 670928 seems to basically just do what the old code did, so that's not the solution either | 00:38 |
fungi | old working code, not the recent patch | 00:39 |
* fungi grumbles and looks closer | 00:39 | |
*** jeremy_houser has quit IRC | 00:40 | |
ianw | fungi: oh, right, so you applied 670928 to the live site? because i was looking at the .js and it had your change already, so that would explain that mystery :) | 00:41 |
fungi | yep, fastest way i knew to see whether this would fix it... and it still doesn't (this basically just reverts it and does a no-op with the extra inverse match) | 00:42 |
clarkb | seemslike ti works the first time | 00:43 |
clarkb | but then subsequent toggles break | 00:44 |
fungi | oh, yep, you're right | 00:44 |
fungi | that function is being called from ci_page_loaded | 00:44 |
fungi | indirectly | 00:44 |
fungi | oh, i think clarkb may be right about needing both | 00:45 |
fungi | yep! | 00:46 |
fungi | clarkb is smrt | 00:46 |
fungi | revising patch | 00:47 |
donnyd | https://www.irccloud.com/pastebin/iUQpqcCI/ | 00:47 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Complete hide logic for Zuul CI comments in Gerrit https://review.opendev.org/670928 | 00:48 |
fungi | clarkb: ianw: ^ | 00:48 |
fungi | that seems to work | 00:48 |
ianw | fungi: is that live so can confirm there? | 00:48 |
fungi | yep | 00:48 |
fungi | until puppet undoes it anyway | 00:48 |
fungi | you'll need a force refresh thanks to all the caching in the world | 00:49 |
fungi | and this explains why the earlier change seemed to work when clarkb tested. it did work, but only on initial load. then when toggling it disappeared | 00:49 |
sgw | cmurphy: I think I licked the osc issues, see: http://logs.openstack.org/63/670363/27/check/starlingx-obs-build/35f6f2b/job-output.txt.gz | 00:50 |
sgw | since I am not doing checks yet about if the spec files or _service files changed or if there is ource code changes to regenerate the _service tarballs | 00:50 |
ianw | yay, logs published @ http://mirror.ord.rax.opendev.org/logs/rsync-mirrors/ ... although need to convince apache that .log files are text it seems | 00:52 |
clarkb | what is the difference between those two functions? | 00:58 |
clarkb | fungi ^ called on different browser events maybe? | 00:58 |
fungi | clarkb: i thought one was called on page load and one on button press | 00:59 |
fungi | but that was mostly a guess, i'm really quite clueless about this stuff | 00:59 |
fungi | just going by where the functions were being called from | 00:59 |
fungi | events later in the script | 01:00 |
*** igordc has quit IRC | 01:01 | |
*** yamamoto has joined #openstack-infra | 01:03 | |
*** yamamoto has quit IRC | 01:06 | |
*** yamamoto has joined #openstack-infra | 01:09 | |
*** gyee has quit IRC | 01:14 | |
*** yamamoto has quit IRC | 01:14 | |
*** imacdonn has quit IRC | 01:15 | |
*** imacdonn has joined #openstack-infra | 01:16 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Publish .log files as text/plain https://review.opendev.org/670934 | 01:34 |
*** yamamoto has joined #openstack-infra | 01:48 | |
*** yamamoto has quit IRC | 01:50 | |
*** yamamoto has joined #openstack-infra | 01:50 | |
*** armax has quit IRC | 01:53 | |
*** apetrich has quit IRC | 01:57 | |
openstackgerrit | Merged opendev/system-config master: Complete hide logic for Zuul CI comments in Gerrit https://review.opendev.org/670928 | 01:58 |
*** yamamoto has quit IRC | 02:01 | |
*** yamamoto has joined #openstack-infra | 02:10 | |
*** yamamoto has quit IRC | 02:11 | |
*** yamamoto has joined #openstack-infra | 02:16 | |
*** yamamoto has quit IRC | 02:20 | |
*** rh-jelabarre has quit IRC | 02:22 | |
*** tkajinam has quit IRC | 02:23 | |
*** tkajinam has joined #openstack-infra | 02:24 | |
*** bhavikdbavishi has joined #openstack-infra | 02:34 | |
*** bhavikdbavishi1 has joined #openstack-infra | 02:37 | |
*** bhavikdbavishi has quit IRC | 02:38 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 02:38 | |
*** whoami-rajat has joined #openstack-infra | 02:42 | |
*** hongbin has joined #openstack-infra | 03:00 | |
*** logan- has quit IRC | 03:11 | |
*** logan- has joined #openstack-infra | 03:14 | |
*** factor has joined #openstack-infra | 03:20 | |
*** yamamoto has joined #openstack-infra | 03:20 | |
*** michael-beaver has quit IRC | 03:21 | |
*** psachin has joined #openstack-infra | 03:26 | |
*** ykarel|away has joined #openstack-infra | 03:30 | |
*** armax has joined #openstack-infra | 03:37 | |
*** hongbin has quit IRC | 03:42 | |
*** udesale has joined #openstack-infra | 04:09 | |
*** armax has quit IRC | 04:10 | |
*** ykarel|away has quit IRC | 04:13 | |
*** ykarel|away has joined #openstack-infra | 04:34 | |
*** ykarel|away is now known as ykarel | 04:35 | |
*** ramishra has joined #openstack-infra | 04:49 | |
*** jamesmcarthur has quit IRC | 04:50 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Disable cloud launcher cron job during CI https://review.opendev.org/670946 | 05:02 |
*** kjackal has joined #openstack-infra | 05:03 | |
ianw | corvus: ^ thanks for the suggestion on that | 05:03 |
*** yamamoto has quit IRC | 05:05 | |
*** igordc has joined #openstack-infra | 05:07 | |
*** pcaruana has joined #openstack-infra | 05:08 | |
*** weifan has joined #openstack-infra | 05:08 | |
*** ricolin_ has joined #openstack-infra | 05:10 | |
*** YaminiU has joined #openstack-infra | 05:16 | |
YaminiU | Hello team, good morning our openstack CI's are failing for the last few jobs with the error in the following line The error appears to have been in '/tmp/tmpisx2rata/986191b8731c42ef85545adf29347fdd/trusted/project_0/git.zuul-ci.org/zuul-base-jobs/playbooks/base/pre.yaml': line 3, column 7, but may2019-07-15 23:56:36.141581 | be elsewhere in the f | 05:17 |
YaminiU | ile depending on the exact syntax problem.2019-07-15 23:56:36.141594 | 2019-07-15 23:56:36.141605 | The offending line appears to be:2019-07-15 23:56:36.141617 | 2019-07-15 23:56:36.141628 | roles:2019-07-15 23:56:36.141639 | - add-build-sshkey2019-07-15 23:56:36.141652 | ^ here2019-07-15 23:56:36.142280 | PRE-RUN END RESULT_NORMAL: [tr | 05:17 |
YaminiU | usted : git.zuul-ci.org/zuul-base-jobs/playbooks/base/pre.yaml@master]2019-07-15 23:56:36.142988 | POST-RUN START: [untrusted : github.com/CiscoSystems/project-config-third-party-cinder/playbooks/dsvm-tempest-cisco-zonemanager-vm-job-post.yaml@master]2019-07-15 23:56:37.998344 | 2019-07-15 23:56:37.998527 | PLAY [all] | 05:17 |
YaminiU | has anything changed in the paste | 05:17 |
YaminiU | http://paste.openstack.org/show/754419/ | 05:17 |
YaminiU | that si the error logs paste link | 05:18 |
johnsom | Temporary failure resolving 'mirror.regionone.fortnebula.opendev.org' | 05:20 |
johnsom | http://logs.openstack.org/96/668996/4/check/octavia-v2-dsvm-scenario-ubuntu-bionic/530abf0/job-output.txt.gz#_2019-07-16_04_43_46_680299 | 05:20 |
johnsom | instance: ubuntu-bionic-fortnebula-regionone-0008952589 | 05:20 |
johnsom | It looks like fortnebula is having a DNS issue | 05:21 |
ianw | johnsom: hrm, or the host is having a network issue, that may be more likely :/ | 05:21 |
ianw | unbound may be unhappy | 05:22 |
johnsom | Ok, just figured I would give you all a heads up | 05:22 |
*** weifan has quit IRC | 05:23 | |
johnsom | Yeah, looks like unbound was trying ipv6 without luck | 05:27 |
YaminiU | hi team | 05:27 |
YaminiU | can anyone help | 05:27 |
YaminiU | any known changes which went in | 05:27 |
*** jamesmcarthur has joined #openstack-infra | 05:29 | |
YaminiU | http://paste.openstack.org/show/754420/ | 05:31 |
YaminiU | this is the full paste of the error | 05:31 |
*** yamamoto has joined #openstack-infra | 05:34 | |
*** jamesmcarthur has quit IRC | 05:37 | |
*** ykarel_ has joined #openstack-infra | 05:39 | |
*** ykarel_ is now known as ykarel|meeting | 05:39 | |
*** ykarel has quit IRC | 05:39 | |
*** udesale has quit IRC | 05:39 | |
*** udesale has joined #openstack-infra | 05:40 | |
*** udesale has quit IRC | 05:42 | |
ianw | YaminiU: what's the change that is causing that? | 05:44 |
YaminiU | i did not do any change | 05:44 |
YaminiU | the last 5 jobs are failing | 05:44 |
YaminiU | before that it was passing | 05:44 |
YaminiU | http://paste.openstack.org/show/754420/ | 05:44 |
YaminiU | this is the full paste of the failure | 05:44 |
ianw | i mean what review is this being reported on? | 05:45 |
YaminiU | https://review.opendev.org/#/c/627941/ | 05:45 |
YaminiU | Cisco | 05:45 |
ianw | YaminiU: i would say, seeing the old url's in there, the cisco 3rd party ci might be hitting something like described in http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-July/000971.html | 05:47 |
YaminiU | oh ok | 05:49 |
YaminiU | let me try that | 05:49 |
YaminiU | thanks for the quick response | 05:49 |
YaminiU | [connection zuul-git]driver=gitbaseurl=https://git.zuul-ci.org/ | 05:50 |
YaminiU | i need to change this | 05:50 |
YaminiU | right | 05:50 |
YaminiU | to opendev url | 05:50 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add some pointers on the OpenDev PPA https://review.opendev.org/670952 | 05:54 |
AJaeger | YaminiU: yes, you need to change the parameter, give me a second... | 05:55 |
YaminiU | ok | 05:55 |
AJaeger | YaminiU: oh, ianw answered already, see http://lists.zuul-ci.org/pipermail/zuul-announce/2019-July/000043.html as well | 05:56 |
AJaeger | YaminiU: see https://opendev.org/zuul/zuul/src/branch/master/doc/source/admin/examples/etc_zuul/zuul.conf#L23-L26 on what kind of connection to add | 05:56 |
YaminiU | ok | 05:56 |
YaminiU | thanks | 05:56 |
*** udesale has joined #openstack-infra | 05:57 | |
AJaeger | YaminiU: you really should subscribe to zuul-announce if you run a Zuul v3 | 05:57 |
YaminiU | can you please give me the link | 05:57 |
YaminiU | am new to to this | 05:57 |
YaminiU | i will subscribe for the same | 05:57 |
AJaeger | YaminiU: See http://lists.zuul-ci.org/ | 05:59 |
YaminiU | thanks | 05:59 |
*** raukadah is now known as chandankumar | 06:03 | |
*** kjackal has quit IRC | 06:24 | |
*** kjackal has joined #openstack-infra | 06:29 | |
*** ruffian_sheep has joined #openstack-infra | 06:32 | |
*** dpawlik has joined #openstack-infra | 06:40 | |
*** pgaxatte has joined #openstack-infra | 06:42 | |
*** yamamoto has quit IRC | 06:51 | |
*** yamamoto has joined #openstack-infra | 06:52 | |
*** yamamoto has quit IRC | 06:52 | |
*** iurygregory has joined #openstack-infra | 06:54 | |
*** yamamoto has joined #openstack-infra | 06:58 | |
*** yamamoto has quit IRC | 06:58 | |
*** rcernin has quit IRC | 07:00 | |
YaminiU | Ajaeger even after changing the url am seeing the same error with the new URL | 07:05 |
YaminiU | http://paste.openstack.org/show/754422/ | 07:05 |
YaminiU | ianw & Ajaeger even after changing the url am seeing the error with the new URL | 07:06 |
YaminiU | http://paste.openstack.org/show/754422/ | 07:06 |
*** rpittau|afk is now known as rpittau | 07:08 | |
*** xek has joined #openstack-infra | 07:09 | |
*** dtantsur|afk is now known as dtantsur | 07:12 | |
*** igordc has quit IRC | 07:16 | |
*** tosky has joined #openstack-infra | 07:21 | |
*** lucasagomes has joined #openstack-infra | 07:21 | |
AJaeger | opendev.org/zuul-base-jobs suprises me, I would have expected opendev.org/zuul/zuul-base-jobs | 07:33 |
AJaeger | YaminiU: can't help further... | 07:33 |
*** jamesmcarthur has joined #openstack-infra | 07:33 | |
YaminiU | even i would have expected the same | 07:33 |
*** ricolin_ is now known as ricolin | 07:33 | |
YaminiU | i had given my base url as opendev.org/zuul | 07:34 |
YaminiU | https://opendev.org/zuul | 07:34 |
*** ginopc has joined #openstack-infra | 07:35 | |
*** pkopec has joined #openstack-infra | 07:36 | |
*** jamesmcarthur has quit IRC | 07:38 | |
*** rascasoft has quit IRC | 07:39 | |
*** ykarel|meeting is now known as ykarel | 07:39 | |
*** rascasoft has joined #openstack-infra | 07:40 | |
*** ykarel_ has joined #openstack-infra | 07:43 | |
*** priteau has joined #openstack-infra | 07:43 | |
*** slaweq has joined #openstack-infra | 07:43 | |
*** ykarel_ is now known as ykarel|lunch | 07:44 | |
*** ykarel has quit IRC | 07:44 | |
*** udesale has quit IRC | 07:46 | |
*** udesale has joined #openstack-infra | 07:46 | |
*** igordc has joined #openstack-infra | 07:47 | |
YaminiU | there is a reference in the same discussion thread which syas the base url should be https://opendev.org | 07:50 |
YaminiU | and not https://opendev.org/zuul | 07:51 |
YaminiU | is it the case | 07:51 |
*** yamamoto has joined #openstack-infra | 08:05 | |
*** igordc has quit IRC | 08:06 | |
*** YaminiU has quit IRC | 08:07 | |
*** kopecmartin|off is now known as kopecmartin | 08:07 | |
*** dchen has quit IRC | 08:11 | |
*** ralonsoh has joined #openstack-infra | 08:13 | |
*** yamamoto has quit IRC | 08:15 | |
*** tkajinam has quit IRC | 08:25 | |
*** yamamoto has joined #openstack-infra | 08:30 | |
*** ykarel|lunch is now known as ykarel | 08:43 | |
*** derekh has joined #openstack-infra | 08:44 | |
*** ruffian_sheep has quit IRC | 08:46 | |
*** ruffian_sheep has joined #openstack-infra | 08:54 | |
*** panda has quit IRC | 08:54 | |
*** jamesmcarthur has joined #openstack-infra | 08:56 | |
*** panda has joined #openstack-infra | 08:57 | |
*** jamesmcarthur has quit IRC | 09:00 | |
*** nfakhir has quit IRC | 09:17 | |
*** ricolin has quit IRC | 09:20 | |
*** jamesmcarthur has joined #openstack-infra | 09:28 | |
*** jamesmcarthur has quit IRC | 09:32 | |
*** yamamoto has quit IRC | 10:00 | |
*** nfakhir has joined #openstack-infra | 10:04 | |
*** yamamoto has joined #openstack-infra | 10:05 | |
*** yamamoto has quit IRC | 10:08 | |
*** priteau has quit IRC | 10:10 | |
*** ykarel is now known as ykarel|afk | 10:12 | |
*** gfidente has joined #openstack-infra | 10:12 | |
*** apetrich has joined #openstack-infra | 10:16 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Silence InsecureRequestWarning from urllib3 https://review.opendev.org/671000 | 10:16 |
*** tosky__ has joined #openstack-infra | 10:17 | |
*** tosky has quit IRC | 10:17 | |
*** tosky__ is now known as tosky | 10:18 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Silence InsecureRequestWarning and password warning https://review.opendev.org/671000 | 10:19 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add support for smart reconfigurations https://review.opendev.org/652114 | 10:19 |
*** udesale has quit IRC | 10:29 | |
*** derekh has quit IRC | 10:29 | |
*** derekh has joined #openstack-infra | 10:38 | |
*** yamamoto has joined #openstack-infra | 10:38 | |
*** YaminiU has joined #openstack-infra | 10:38 | |
*** yamamoto has quit IRC | 10:41 | |
YaminiU | i have given the base url as mentioned in the thread https://opendev.org/zuulJob console starting...2019-07-16 10:16:13.032990 | Running Ansible setup...2019-07-16 10:16:19.128313 | PRE-RUN START: [trusted : opendev.org/zuul-base-jobs/playbooks/base/pre.yaml@master]2019-07-16 10:16:20.497776 | ERROR! the role 'add-build-sshkey' was not found in /tm | 10:43 |
YaminiU | p/tmpqq0z4517/3ccef2ba127840e9a9f7c6552ff2d642/trusted/project_0/opendev.org/zuul-base-jobs/playbooks/base/roles:/tmp/tmpqq0z4517/3ccef2ba127840e9a9f7c6552ff2d642/work/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/tmp/tmpqq0z4517/3ccef2ba127840e9a9f7c6552ff2d642/trusted/project_0/opendev.org/zuul-base-jobs/playbooks/base2019-07-16 10: | 10:43 |
YaminiU | 16:20.497863 | 2019-07-16 10:16:20.497882 | The error appears to have been in '/tmp/tmpqq0z4517/3ccef2ba127840e9a9f7c6552ff2d642/trusted/project_0/opendev.org/zuul-base-jobs/playbooks/base/pre.yaml': line 3, column 7, but may2019-07-16 10:16:20.497901 | be elsewhere in the file depending on the exact syntax problem.2019-07-16 10:16:20.497916 | 2019 | 10:43 |
YaminiU | -07-16 10:16:20.497930 | The offending line appears to be:2019-07-16 10:16:20.497945 | 2019-07-16 10:16:20.497958 | roles:2019-07-16 10:16:20.497976 | - add-build-sshkey2019-07-16 10:16:20.497992 | ^ herefirst point when it clones it clones into opendev.org/zuul-base-jobsthere is no zuul in theresecond point it tries to get roles from t | 10:43 |
YaminiU | eh zuul-base jobs folderbut in actuals the base job folder does not have any rolesit is the zuul jobs folder which has all the roles | 10:43 |
YaminiU | http://paste.openstack.org/show/754422/ | 10:43 |
YaminiU | paste link of error | 10:43 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add --check-config option to zuul scheduler https://review.opendev.org/542160 | 10:43 |
YaminiU | herefirst point when it clones it clones into opendev.org/zuul-base-jobsthere is no zuul in theresecond point it tries to get roles from teh zuul-base jobs folderbut in actuals the base job folder does not have any rolesit is the zuul jobs folder which has all the roles | 10:43 |
*** electrofelix has joined #openstack-infra | 10:45 | |
*** yamamoto has joined #openstack-infra | 10:47 | |
*** bhavikdbavishi has quit IRC | 10:51 | |
*** yamamoto has quit IRC | 10:54 | |
*** yamamoto has joined #openstack-infra | 10:56 | |
*** yamamoto has quit IRC | 10:56 | |
*** yamamoto has joined #openstack-infra | 10:56 | |
*** pgaxatte has quit IRC | 11:01 | |
*** kjackal has quit IRC | 11:08 | |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Spec for allowing circular dependencies https://review.opendev.org/643309 | 11:08 |
*** kjackal has joined #openstack-infra | 11:09 | |
*** ykarel|afk is now known as ykarel | 11:12 | |
*** jamesmcarthur has joined #openstack-infra | 11:29 | |
*** tesseract has joined #openstack-infra | 11:30 | |
*** rh-jelabarre has joined #openstack-infra | 11:31 | |
openstackgerrit | Ghanshyam Mann proposed opendev/elastic-recheck master: Add query for test cold migration revert failure bug 1836595 https://review.opendev.org/671013 | 11:32 |
openstack | bug 1836595 in neutron "test_server_connectivity_cold_migration_revert failing" [Undecided,New] https://launchpad.net/bugs/1836595 | 11:32 |
openstackgerrit | Merged zuul/zuul master: Run cleanup playbooks in job thread https://review.opendev.org/670888 | 11:32 |
*** jamesmcarthur has quit IRC | 11:33 | |
*** tdasilva has joined #openstack-infra | 11:44 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add support for smart reconfigurations https://review.opendev.org/652114 | 11:49 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add --check-config option to zuul scheduler https://review.opendev.org/542160 | 11:50 |
openstackgerrit | Merged zuul/zuul master: web: add OpenAPI documentation https://review.opendev.org/535541 | 11:52 |
*** yamamoto has quit IRC | 11:54 | |
*** lpetrut has joined #openstack-infra | 11:57 | |
*** jamesmcarthur has joined #openstack-infra | 12:03 | |
AJaeger | YaminiU: use https://opendev.org/zuul/zuul/src/branch/master/doc/source/admin/examples/etc_zuul/zuul.conf#L23-L26 as is as example - don't change the baseurl. opendev.org/zuul does not work | 12:04 |
*** Lucas_Gray has joined #openstack-infra | 12:06 | |
*** jamesmcarthur has quit IRC | 12:08 | |
*** bhavikdbavishi has joined #openstack-infra | 12:10 | |
*** tdasilva has quit IRC | 12:12 | |
*** yamamoto has joined #openstack-infra | 12:12 | |
*** yamamoto has quit IRC | 12:12 | |
*** tdasilva has joined #openstack-infra | 12:12 | |
*** jamesmcarthur has joined #openstack-infra | 12:15 | |
*** goldyfruit has quit IRC | 12:16 | |
*** gtarnaras has joined #openstack-infra | 12:17 | |
*** snapiri has quit IRC | 12:18 | |
*** snapiri has joined #openstack-infra | 12:18 | |
openstackgerrit | Merged zuul/zuul master: web: add tenant and project scoped, JWT-protected actions https://review.opendev.org/576907 | 12:26 |
*** lpetrut has quit IRC | 12:28 | |
*** ruffian_sheep has quit IRC | 12:30 | |
*** lpetrut has joined #openstack-infra | 12:32 | |
*** ricolin has joined #openstack-infra | 12:35 | |
*** ijw has joined #openstack-infra | 12:38 | |
*** ykarel is now known as ykarel|afk | 12:39 | |
*** pgaxatte has joined #openstack-infra | 12:42 | |
*** ykarel_ has joined #openstack-infra | 12:43 | |
*** ijw has quit IRC | 12:43 | |
*** ykarel_ has quit IRC | 12:44 | |
openstackgerrit | Merged zuul/zuul master: Allow operator to generate auth tokens through the CLI https://review.opendev.org/636197 | 12:45 |
*** ykarel|afk has quit IRC | 12:45 | |
*** aaronsheffield has joined #openstack-infra | 12:48 | |
*** YaminiU has quit IRC | 12:50 | |
*** udesale has joined #openstack-infra | 12:52 | |
*** tesseract has quit IRC | 12:53 | |
*** yamamoto has joined #openstack-infra | 12:54 | |
*** tesseract has joined #openstack-infra | 12:55 | |
*** lpetrut has quit IRC | 12:57 | |
*** yamamoto has quit IRC | 12:58 | |
*** ginopc has quit IRC | 12:59 | |
*** ekultails has joined #openstack-infra | 13:01 | |
*** Lucas_Gray has quit IRC | 13:05 | |
*** michael-beaver has joined #openstack-infra | 13:06 | |
*** Lucas_Gray has joined #openstack-infra | 13:11 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Install system dependencies for tox-molecule https://review.opendev.org/671029 | 13:13 |
*** ykarel has joined #openstack-infra | 13:16 | |
*** ricolin has quit IRC | 13:18 | |
*** jamesmcarthur has quit IRC | 13:19 | |
*** eharney has quit IRC | 13:22 | |
coreycb | hi infra, would someone be able to give some input on this review wrt retaining the py35 zuul unit test job for tempest? https://review.opendev.org/#/c/670639/ | 13:23 |
*** goldyfruit has joined #openstack-infra | 13:28 | |
*** gtarnaras has quit IRC | 13:29 | |
*** kjackal has quit IRC | 13:30 | |
*** liuyulong has joined #openstack-infra | 13:33 | |
*** kjackal has joined #openstack-infra | 13:33 | |
*** trident has quit IRC | 13:38 | |
*** trident has joined #openstack-infra | 13:39 | |
*** rh-jelabarre has quit IRC | 13:48 | |
*** rh-jelabarre has joined #openstack-infra | 13:50 | |
sshnaidm|rover | If I have job defined with specific branches, not including master: https://review.opendev.org/#/c/670168/2/zuul.d/standalone-jobs.yaml - can I still run it on master when specifying "branches: master" for job vars in pipeline config? | 13:51 |
sshnaidm|rover | because it doesn't seem to work: https://review.opendev.org/#/c/670176/1/zuul.d/layout.yaml | 13:51 |
*** psachin has quit IRC | 13:52 | |
clarkb | coreycb: there is no issue with that. Tempest is branchless and expects to function across the breadth of currently supported branches and their platforms | 14:08 |
coreycb | clarkb: ok thanks for the input. yeah i wasn't sure if we are trying to get away from xenial or not. | 14:08 |
clarkb | that means testing python35 until those branches are untestable (which is a bit undefined I think but likely lines up with platform eol) | 14:09 |
coreycb | clarkb: ok sounds good, we'll keep the py35 tests then | 14:09 |
jrosser | i've seen a few of these in the last couple of days seeing a fair few jobs fail with this Client Error: Not Found for url: https://opendev.org/openstack/requirements/raw/f67ac75ebb1ac96e7f6a511d9c5e4c3de21c38d4/upper-constraints.txt | 14:15 |
*** yamamoto has joined #openstack-infra | 14:15 | |
jrosser | arg paste, but ykwim | 14:15 |
clarkb | possible we have another corrupt fs on one of the backends | 14:16 |
clarkb | we'll have to check the 8 against that url and see if any fail | 14:17 |
*** yamamoto has quit IRC | 14:22 | |
*** eharney has joined #openstack-infra | 14:23 | |
clarkb | gitea08 404s on that url | 14:28 |
clarkb | I'm not on real computer yet but maybe someone who is can check for fs problems on that host | 14:28 |
openstackgerrit | Artom Lifshitz proposed opendev/elastic-recheck master: Add query for bug 1836754 https://review.opendev.org/671051 | 14:30 |
openstack | bug 1836754 in OpenStack Compute (nova) "Conflict when deleting allocations for an instance that hasn't finished building" [Undecided,New] https://launchpad.net/bugs/1836754 | 14:30 |
*** icarusfactor has joined #openstack-infra | 14:32 | |
clarkb | sshnaidm|rover: I think the job may not exist in the context of master since that branch was excluded which means there is nothing to override | 14:33 |
sshnaidm|rover | clarkb, so maybe better to include all, and then override it in each repo? | 14:33 |
clarkb | or define it mulyiple times with the group matchers for each set of branches | 14:35 |
fungi | clarkb: i'm checking it now | 14:35 |
*** factor has quit IRC | 14:35 | |
corvus | fungi, clarkb: lots of oom errors | 14:35 |
fungi | yup | 14:35 |
*** dpawlik has quit IRC | 14:36 | |
fungi | there was a memory spike around 06:25 according to cacti | 14:37 |
corvus | it chooses to kill git a lot, so it could have just aborted a replication push | 14:37 |
fungi | that roughly corresponds with the 06:18-06:22 ooms in the dmesg | 14:38 |
corvus | it killed 6 git processes and 1 gitea process | 14:38 |
*** pgaxatte has quit IRC | 14:38 | |
fungi | i'm going to check the other gitea servers for similar issues | 14:38 |
*** pgaxatte has joined #openstack-infra | 14:39 | |
fungi | but on 08 it looks like it killed lots of stuff | 14:39 |
fungi | mysqld, containerd, git, sshd | 14:40 |
*** armax has joined #openstack-infra | 14:42 | |
fungi | ahh, nevermind, those were just the triggering mallocs | 14:42 |
corvus | fungi: really? i only see it killing git... what's.. | 14:42 |
corvus | ok | 14:42 |
fungi | it's all git processes killed | 14:42 |
fungi | in each case | 14:42 |
corvus | and yeah, i was wrong about the gitea proc too... it said "Kill process 21950 (gitea) score 94 or sacrifice child" but apparently went with 'sacrifice child' on that one because it killed git | 14:42 |
fungi | at least in this event and the similar (smaller) one on wednesday | 14:42 |
fungi | back on 2019-06-27 there was a bout of ooms and a pandoc process got killed | 14:43 |
fungi | same back on 2019-06-18 | 14:43 |
fungi | out of the 8 gitea servers, only 02 has no oom events in dmesg | 14:45 |
fungi | some have more, some less, various dates and times, almost always git processes are sacrificed | 14:45 |
fungi | i'm going to go out on a limb and assume 02 has merely been lucky | 14:46 |
clarkb | doesnt cacti show tons of available memory though? | 14:47 |
fungi | most of the time | 14:47 |
clarkb | could we be hitting lower cgroup limits? | 14:47 |
fungi | these look like sudden spikes | 14:47 |
fungi | http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=66794&rra_id=1&view_type=&graph_start=1563201378&graph_end=1563287778 | 14:47 |
fungi | er, i meant http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66794&rra_id=all | 14:48 |
*** ginopc has joined #openstack-infra | 14:48 | |
fungi | [Tue Jul 16 06:22:09 2019] Killed process 16582 (git) total-vm:1536656kB, anon-rss:729868kB, file-rss:0kB, shmem-rss:0kB | 14:49 |
fungi | these servers have 8gb ram and no swap | 14:49 |
fungi | hrm, yeah that doesn't look close to 8gb on its own | 14:50 |
clarkb | They dont have the ephemeral drive that our launch scripts normally set up for swap but also most dont have the spare disk to add a swap file or device | 14:50 |
clarkb | on 06 wecould add a swap file without worry but the othersneed rebuilding with more disk | 14:50 |
clarkb | double checking docker doesnt limit memory use by default | 14:51 |
clarkb | unlikely to be cgroups then I guess | 14:51 |
fungi | well, regardless, the spike in cacti suggests the server did run out of available ram for ~1 poll | 14:52 |
*** bhavikdbavishi has quit IRC | 14:52 | |
clarkb | the shortterm fix here is to force replication for gitea08 then? | 14:59 |
mordred | clarkb: I think that seems reasonable | 14:59 |
fungi | hrm... 06:25 is when cron.daily is triggered... coincidence? probably, the ooms started around 06:18 if the dmesg timestamps are accurate. *:17 is cron.hourly kicks off but there's nothing in it | 14:59 |
clarkb | daily would do logrotate? | 15:00 |
clarkb | 4:42 is the db backup | 15:00 |
clarkb | I guess it could be an external actor's daily cron too | 15:00 |
*** eernst has joined #openstack-infra | 15:01 | |
fungi | ohhh | 15:01 |
fungi | yeah, that's a distinct possibility | 15:01 |
clarkb | we can probably check the gitea logs for requests around that time period? | 15:01 |
fungi | there is a corresponding spike in network traffic too | 15:01 |
fungi | and cpu usage | 15:01 |
*** pkopec has quit IRC | 15:01 | |
fungi | and load average | 15:02 |
fungi | no corresponding spike on other gitea servers though, suggesting it was likely a single client address? | 15:02 |
mordred | so maybe just someone did a thing that caused the memory/cpu/traffic spike and beause of load balancing it happened to go to gitea08 | 15:03 |
*** pkopec has joined #openstack-infra | 15:03 | |
*** jamesmcarthur has joined #openstack-infra | 15:03 | |
fungi | that's what's looking likely so far | 15:04 |
sshnaidm|rover | clarkb, trying to redefine jobs branches, but still don't see it's queued for this patch: https://review.opendev.org/#/c/671055/2/zuul.d/standalone-jobs.yaml - is anything wrong there? | 15:07 |
clarkb | sshnaidm|rover: the job has to be in a pipeline queue too | 15:07 |
sshnaidm|rover | clarkb, you mean this? https://review.opendev.org/#/c/671055/2/zuul.d/layout.yaml | 15:09 |
*** kjackal has quit IRC | 15:09 | |
clarkb | hrm not sure then | 15:10 |
clarkb | fungi: mordred corvus should I go ahead and trigger gitea08 replication? | 15:16 |
*** eernst has quit IRC | 15:17 | |
fungi | clarkb: yeah, sorry, conference call has sapped my continued troubleshooting, hoping to get back to it shortly | 15:20 |
clarkb | ya juggling that too | 15:20 |
corvus | clarkb: ++ | 15:20 |
corvus | clarkb, fungi, mordred: i found what's broken with parallel gitea repo creation -- it updates the "user" db table and sets "num_repos". so if i do 10 creates in parallel, it updates the user table 10 times and sets num_repos to 1 each time. | 15:21 |
corvus | when the system starts, it does a check, and so it fixes that. that's why restarting caused it to go back to normal. | 15:22 |
corvus | here's the increment: https://github.com/go-gitea/gitea/blob/master/models/org_team.go#L131-L136 | 15:22 |
clarkb | corvus: that seems like a legit gitea bug | 15:22 |
corvus | here's the call site: https://github.com/go-gitea/gitea/blob/master/models/repo.go#L1321 | 15:23 |
corvus | and here's one more level up the stack where it creates the orm session: https://github.com/go-gitea/gitea/blob/master/models/repo.go#L1371-L1377 | 15:23 |
corvus | so each creation is happening in its own session | 15:23 |
corvus | there's a Begin() in there, so maybe in its own transaction? | 15:24 |
mordred | yeah - that sounds like a db layer bug - e.Incr is likely doing it go-side rather than db-side | 15:24 |
mordred | (which is a common programming error by folks) | 15:24 |
mordred | update table set foo=foo+1 is an atomic operation that's safe to do from multiple calling threads at once | 15:25 |
* mordred reads more go code | 15:25 | |
corvus | repo.go: if _, err = sess.Exec("UPDATE `user` SET num_repos=num_repos+1 WHERE id=?", newOwner.ID); err != nil { | 15:25 |
corvus | mordred: ^ that exists elsewhere in gitea | 15:25 |
corvus | mordred: maybe we should change that incr to that ^ ? | 15:25 |
*** chandankumar is now known as raukadah | 15:26 | |
mordred | maybe so - I'm reading the xorm code to try to see what it thinks it's doing ... but yeah, that seems like a safe change to make if it's already in the codebase elsewhere | 15:26 |
corvus | i'll start by checking i can repro on master :) | 15:27 |
mordred | ++ | 15:28 |
mordred | corvus: oh - xorm seems to also have lunny as a committer - so if we do find a bug, we at least know a peron | 15:28 |
mordred | person | 15:28 |
mordred | corvus: I'm confused - the Incr call *seems* like it at least intends to do num_repos=num_repos+! | 15:31 |
mordred | corvus: I'm confused - the Incr call *seems* like it at least intends to do num_repos=num_repos+1 | 15:31 |
*** e0ne has joined #openstack-infra | 15:32 | |
corvus | mordred: huh. could it be doing it in several isolated transactions, and as long as they come out the same, it's not a conflict? i'm rusty here. | 15:34 |
mordred | corvus: I can't follow all the magic - I think it's worth changing to the sess.Exec and see if that fixes it | 15:34 |
corvus | ok. i'm still building images, so it'll be a few mins | 15:35 |
mordred | corvus: it could do - if they're doing explicit begin/commit transactions. at least with mysql the answer to this pattern is to send the update call outside of an explicit transaction and let it be handled as an atomic operation | 15:35 |
mordred | that would also depened on isolation level | 15:36 |
*** Lucas_Gray has quit IRC | 15:36 | |
*** gyee has joined #openstack-infra | 15:39 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running https://review.opendev.org/670395 | 15:39 |
*** Lucas_Gray has joined #openstack-infra | 15:41 | |
openstackgerrit | Stephen Finucane proposed zuul/zuul master: web: Add warning about incompleteness of OpenAPI spec https://review.opendev.org/671086 | 15:41 |
*** diablo_rojo has joined #openstack-infra | 15:41 | |
mordred | corvus: ok - no, it doesn't depend on isolation level - and update set x=x+1 should always do the right thing | 15:42 |
clarkb | mordred: that is probably done without locks and instead uses a type that can always be updated atomicly. yay databases solving problems for us | 15:43 |
clarkb | replication completed and https://gitea08.opendev.org:3000/openstack/requirements/raw/commit/f67ac75ebb1ac96e7f6a511d9c5e4c3de21c38d4/upper-constraints.txt exists now | 15:43 |
clarkb | jrosser: ^ hopefully that addresses the problem in the short term | 15:44 |
jrosser | clarkb: great thankyou, I’ll let you know if I see any more | 15:44 |
corvus | mordred: ok, reproduced with local build of gitea master; will try modifying now | 15:45 |
mordred | corvus: woot | 15:46 |
*** eharney has quit IRC | 15:46 | |
*** eharney has joined #openstack-infra | 15:46 | |
clarkb | I'll trigger replication for a backend at a time to ensure they are all in sync as of roughly today | 15:46 |
corvus | mordred: i don't suppose you ran across a "get session from engine" method? :) | 15:47 |
*** igordc has joined #openstack-infra | 15:47 | |
clarkb | on gitea06 we can create a swapfile (it has more disk than the others) | 15:48 |
clarkb | and if that makes the OOMs go away there I guess we can roll that out broadly as we replace servers? | 15:48 |
corvus | mordred: nm -- the "engine" variable is actually the session | 15:49 |
fungi | clarkb: that seems like a reasonable next step. we may also want to consider some sort of per-client rate limiting if haproxy is capable of that | 15:49 |
corvus | clarkb: ++ | 15:49 |
mordred | corvus: I only found factory factories that factory the session generation factory | 15:49 |
mordred | clarkb: ++ | 15:49 |
corvus | mordred: that's no good, i can only use a factory factory factory | 15:49 |
clarkb | also as a reminder I'm likely to not be around much of today as I'm taking advantage of people traveling to portland for oscon | 15:50 |
clarkb | Will be around for the meeting though | 15:50 |
mordred | corvus: you could wrap the factory factory in a generator factory and then async callback that promise to a deferred factory generator | 15:50 |
fungi | okay, so looking at cacti graphs the cpu/load spike also appears on gitea06 but does not seem to have much in the way of a corresponding network spike | 15:51 |
mordred | clarkb: I will miss the meeting today ... I have not moved the github repos in openstack-infra yet - although I do have a non-working script started locally | 15:51 |
corvus | mordred: oh shoot, i think i got something backwards | 15:52 |
mordred | corvus: ono | 15:52 |
corvus | mordred: i didn't notice that addRepository was operating on the team table. that *does* have the correct num_repos. so it looks like Incr is working.... but where is the user getting updated...? | 15:54 |
mordred | corvus: oh. well - at least that makes it make more sense in some ways - I really couldn't figure ouw what was wrong with Incr | 15:55 |
clarkb | re making swap I think I'll push an update to make_swap.sh that we can quickly read over and if that looks good manually run it on 06. Then if we merge that change the new nodes will all automagically get swap | 15:56 |
mordred | corvus: are we looking for the "how many repos does this user have" setting gets updated? | 15:58 |
mordred | corvus: https://github.com/go-gitea/gitea/blob/master/models/repo.go#L1309 | 16:00 |
mordred | corvus: then https://github.com/go-gitea/gitea/blob/master/models/repo.go#L1312 | 16:00 |
mordred | corvus: it's that udpate | 16:00 |
mordred | update | 16:00 |
*** pgaxatte has quit IRC | 16:00 | |
mordred | corvus: we need to split that into two calls - one to do the update to set LastRepoVisibility (for which I think it's perfectly fine for it to race) | 16:01 |
mordred | corvus: and then an Incr call | 16:01 |
mordred | (or a direct update) | 16:01 |
mordred | and u.NumRepos++ should move after the updateUser call so that it doesnt' gt sucked in | 16:02 |
mordred | corvus: you want me to try to make a PR? | 16:02 |
corvus | mordred: thank you! i did not see NumRepos i was looking for num_repos. gimme a sec to catch up | 16:02 |
corvus | mordred: yes, that would be great -- why don't you make a branch on github and let me test it before you actually open the pr | 16:03 |
corvus | mordred: and i agree with your analysis | 16:04 |
mordred | corvus: http://paste.openstack.org/show/754445/ I thnik will do it | 16:06 |
mordred | corvus: but yes - I make branch now | 16:06 |
*** lucasagomes has quit IRC | 16:06 | |
*** bobh has joined #openstack-infra | 16:07 | |
*** Lucas_Gray has quit IRC | 16:07 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Use swapfile if no extra device is present https://review.opendev.org/671102 | 16:08 |
clarkb | infra-root ^ something like that for automagically creating swap on node creation. I'll run that manually if that looks about right | 16:08 |
*** ricolin has joined #openstack-infra | 16:09 | |
mordred | corvus: https://github.com/emonty/gitea/pull/new/fix-user-total-repo | 16:09 |
mordred | corvus: I'm guessing we need the Update(u) part of that - perhaps because that's how it knows what table to update? | 16:09 |
corvus | mordred: is that a private repo? | 16:09 |
corvus | oh, no i think that's a "create a PR" url you sent :) | 16:10 |
corvus | https://github.com/emonty/gitea/tree/fix-user-total-repo | 16:10 |
mordred | oh - hahaha | 16:11 |
corvus | mordred: isn't Update(u) still going to overwrite the num_repos? | 16:11 |
mordred | corvus: oh - hrm. maybe what we want there is Update(new User) like in the other call | 16:12 |
*** ginopc has quit IRC | 16:12 | |
corvus | mordred: wait, strike that | 16:12 |
corvus | mordred: i meant the updateUser() call | 16:13 |
corvus | mordred: does updateUser only update what's changed, or everything? cause if it's everything, we could end up setting num_repos=0 in each thread before we execute the incr | 16:13 |
mordred | corvus: I would think it updates what's changed ... but let me go read | 16:14 |
corvus | e.ID(u.ID).AllCols().Update(u) | 16:14 |
mordred | corvus: I pushed up a patach that does Update(new(User)) just to be sure on that front | 16:14 |
mordred | corvus: piddle | 16:14 |
corvus | mordred: well, i don't know what xorm does with those methods | 16:14 |
mordred | me either | 16:14 |
corvus | mordred: yes i do | 16:14 |
corvus | UPDATE `user` SET `lower_name` = ?, `name` = ?, `full_name` = ?, `email` = ?, `keep_email_private` = ?, `passwd` = ?, `passwd_hash_algo` = ?, `must_change_password` = ?, `login_type` = ?, `login_source` = ?, `login_name` = ?, `type` = ?, `location` = ?, `website` = ?, `rands` = ?, `salt` = ?, `language` = ?, `description` = ?, `last_login_unix` = ?, `last_repo_visibility` = ?, `max_repo_creation` | 16:14 |
corvus | = ?, `is_active` = ?, `is_admin` = ?, `allow_git_hook` = ?, `allow_import_local` = ?, `allow_create_organization` = ?, `prohibit_login` = ?, `avatar` = ?, `avatar_email` = ?, `use_custom_avatar` = ?, `num_followers` = ?, `num_following` = ?, `num_stars` = ?, `num_repos` = ?, `num_teams` = ?, `num_members` = ?, `visibility` = ?, `diff_view_style` = ?, `theme` = ?, `updated_unix` = ? WHERE `id`=? | 16:14 |
corvus | mordred: ^ show full processlist | 16:14 |
corvus | so yeah, it's the bad thing | 16:15 |
mordred | awesome. so we need to exclude num_repos | 16:15 |
corvus | mordred: is it just the 2 things? lastvis and numrepos? | 16:15 |
mordred | actually - yeah - so we can jkust change that call | 16:15 |
mordred | update coming | 16:16 |
corvus | (and yeah, it looks to me like it's just the 2) | 16:16 |
mordred | corvus: do you think we use the mysql column name or the go variable name in teh Cols argument? | 16:16 |
mordred | I'm gonna go with mysql to start | 16:16 |
*** mattw4 has joined #openstack-infra | 16:17 | |
corvus | mordred: Incr uses mysql col name | 16:17 |
corvus | so that sounds like a good bet | 16:17 |
mordred | corvus: ok. force-pushed to the branch | 16:18 |
mordred | should be the full story now :) | 16:18 |
corvus | mordred: cool, i'll build that and try it out | 16:18 |
*** mattw4 has quit IRC | 16:19 | |
*** mattw4 has joined #openstack-infra | 16:19 | |
mordred | fwiw - the other solution here would be to turn the original select into a select for update - which would put a row lock on the row in the db causing the other threads to block waiting which would reduce the concurrency but allow the select/update pattern | 16:19 |
mordred | that would be much harder to plumb in though, since the User comes from teh request context | 16:22 |
*** ykarel has quit IRC | 16:25 | |
*** hamzy has quit IRC | 16:25 | |
*** jamesmcarthur has quit IRC | 16:29 | |
*** jamesmcarthur has joined #openstack-infra | 16:31 | |
corvus | mordred: \o/ UPDATE `user` SET `last_repo_visibility` = ?, `updated_unix` = ? WHERE `id`=? | 16:33 |
*** tesseract has quit IRC | 16:33 | |
corvus | mordred: count looks correct | 16:33 |
corvus | mordred: i think you're gtg on opening the pr | 16:34 |
*** electrofelix has quit IRC | 16:35 | |
mordred | corvus: woot! | 16:37 |
*** rpittau is now known as rpittau|afk | 16:38 | |
corvus | mordred: i'm doing a full run of project creation locally, and i think i'm seeing the database be the bottleneck -- specifically this update (i mean, i think it was before, which is why i was able to actually see the update in the processlist). i'm wondering if there's some bit of tuning we can do? we're basically just running the mariadb image with all the defaults.. | 16:38 |
corvus | mordred: http://paste.openstack.org/show/754446/ | 16:39 |
*** ijw has joined #openstack-infra | 16:39 | |
corvus | mordred: or -- am i seeing mariadb just waiting for the session commit? | 16:39 |
*** ijw has joined #openstack-infra | 16:39 | |
mordred | yeah - you might be just waiting for the session commit if they're not doing autocommit | 16:42 |
corvus | mordred: https://github.com/go-gitea/gitea/blob/master/models/repo.go#L1401 | 16:42 |
corvus | mordred: looks like the filesystem stuff happens after that update and before the commit | 16:43 |
corvus | that's probably the safest thing. :( | 16:43 |
mordred | yeah - but it'll definitely be a smidge of a bottleneck | 16:44 |
mordred | still - we should at least see correct results when we're done | 16:44 |
*** armax has quit IRC | 16:48 | |
donnyd | clarkb: seems like I am still getting a timeout here or there.. but I am not sure its infra related | 16:48 |
donnyd | http://logs.openstack.org/48/670848/1/check/tempest-full-rocky/60be484/job-output.txt | 16:48 |
donnyd | seems to be the same job that hits it | 16:48 |
corvus | mordred: it looks like the upshot of that is that parallelized repo creation is only 86% the time as serialized; so it'll save us a couple minutes, but it's not huge. however, parallelized repo updates can save us a lot of time. so i think we can do serialized repo creation + parallelized repo updates and cut our time in half, and then once we upgrade to gitea with your patch, shave a few more | 16:48 |
corvus | mins off. | 16:48 |
corvus | mordred: i'll work on setting that up | 16:49 |
donnyd | not sure if that is cpu, memory or storage bound... I got the storage performance problem all shored up | 16:50 |
*** gfidente has quit IRC | 16:51 | |
*** gfidente has joined #openstack-infra | 16:51 | |
*** ramishra has quit IRC | 16:51 | |
openstackgerrit | Merged zuul/zuul master: web: Add warning about incompleteness of OpenAPI spec https://review.opendev.org/671086 | 16:52 |
donnyd | I can say for sure its not storage bound this time around. | 16:53 |
donnyd | read: IOPS=21.9k, BW=85.7MiB/s (89.9MB/s)(3070MiB/35809msec) | 16:53 |
donnyd | write: IOPS=7334, BW=28.7MiB/s (30.0MB/s)(1026MiB/35809msec) | 16:53 |
donnyd | on a loaded hypervisor | 16:53 |
AJaeger | sshnaidm|rover: use debug: true on your check queue to figure out why jobs are run or not | 16:55 |
donnyd | and with a little larger block size I still have this left over write: IOPS=460, BW=461MiB/s (483MB/s)(4096MiB/8894msec) | 16:55 |
*** hamzy has joined #openstack-infra | 16:55 | |
*** derekh has quit IRC | 17:01 | |
fungi | donnyd: http://logs.openstack.org/48/670848/1/check/tempest-full-rocky/60be484/controller/logs/dstat-csv_log.txt.gz might be useful to see what performance the job was experiencing | 17:02 |
mordred | corvus: ++ | 17:03 |
fungi | donnyd: but also, you might consider filtering for build_name=tempest-full-rocky across all providers for timeouts | 17:04 |
donnyd | not sure how to make heads or tails of that fungi | 17:04 |
fungi | could be it's not just runs in fortnebula which are timing out, and that the job itself is just generally running dangerously close to its (currently configured) 2-hour timeout | 17:05 |
donnyd | seems to just be me | 17:05 |
donnyd | at least int he last 12 hours | 17:06 |
*** aluria has quit IRC | 17:06 | |
donnyd | and the last 7 days | 17:06 |
fungi | dstat is like systat/sar, and the rows in that csv are points in time during the job, columns are various stats like cpu utilization, memory, disk read and write operations and bandwidth, et cetera | 17:07 |
*** udesale has quit IRC | 17:09 | |
fungi | might indicate that the job spent a long time with the guest at 100% cpu utilization or high system load count | 17:09 |
fungi | can also compare it to similar dstat files from other runs of the same job in different providers if that's of interest | 17:10 |
*** yamamoto has joined #openstack-infra | 17:11 | |
*** kopecmartin is now known as kopecmartin|off | 17:11 | |
*** dtantsur is now known as dtantsur|afk | 17:15 | |
*** goldyfruit has quit IRC | 17:16 | |
*** igordc has quit IRC | 17:17 | |
*** e0ne has quit IRC | 17:20 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Publish api-ref/api-guide to docs.o.o https://review.opendev.org/671117 | 17:22 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Remove publish-openstack-manuals-developer-lang https://review.opendev.org/671118 | 17:22 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Use a thread pool to update gitea repos faster https://review.opendev.org/670920 | 17:32 |
corvus | mordred: ^ that took 9m13s locally | 17:32 |
corvus | mordred: i wonder if we ran the different orgs in parallel, if that would avoid the db contention (since we would be updating num_repos of different users)? | 17:33 |
corvus | we don't have a lot of orgs (atm), but even if we run x/ and openstack/ at the same time, maybe it would help | 17:33 |
corvus | i'll see if i can give that a try | 17:33 |
*** armax has joined #openstack-infra | 17:34 | |
*** weifan has joined #openstack-infra | 17:34 | |
fungi | ahh, yeah, so serial creation within an org, but parallel between different orgs | 17:34 |
mordred | corvus: yeah - I could see that maybe helping - although my hunch says that since the db commit is waiting on the repo creation on disk, it's more that it _looks_ more like db contention than it actually is | 17:35 |
fungi | also if you're limiting it to a fixed number of threads, sort by number of repos per org so you frontload the largest ones and do the smaller ones at the end as threads free up | 17:35 |
*** ijw has quit IRC | 17:36 | |
fungi | that ought to maximize packing efficiency to shave off a bit more time (and increasingly as we grow more orgs in the future) | 17:36 |
corvus | mordred: yeah, i guess it depends on how efficient that disk work is -- can we get 2 of those going at the same time, or is it already at 100% utilization. fungi ++ good point | 17:36 |
*** ijw has joined #openstack-infra | 17:36 | |
fungi | that only sprang to mind because we convinced gerrit upstream to implement a similar optimization in the reindex queuing | 17:38 |
* mordred afks for a bit - biab | 17:38 | |
fungi | (though in that case it was change refs per repo i think) | 17:38 |
*** ijw has quit IRC | 17:38 | |
*** ijw has joined #openstack-infra | 17:39 | |
*** ricolin has quit IRC | 17:40 | |
*** jamesmcarthur has quit IRC | 17:43 | |
*** igordc has joined #openstack-infra | 17:44 | |
*** _Cyclone_ has quit IRC | 17:50 | |
*** _Cyclone_ has joined #openstack-infra | 17:52 | |
*** electrofelix has joined #openstack-infra | 18:00 | |
*** tosky has quit IRC | 18:01 | |
*** eharney has quit IRC | 18:02 | |
*** electrofelix has quit IRC | 18:03 | |
*** weifan has quit IRC | 18:04 | |
*** weifan has joined #openstack-infra | 18:04 | |
*** eharney has joined #openstack-infra | 18:06 | |
*** jamesmcarthur has joined #openstack-infra | 18:07 | |
*** weifan has quit IRC | 18:09 | |
*** weifan has joined #openstack-infra | 18:09 | |
*** dpawlik has joined #openstack-infra | 18:10 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Zuul CLI: allow access via REST https://review.opendev.org/636315 | 18:13 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Add Authorization Rules configuration https://review.opendev.org/639855 | 18:13 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Web: plug the authorization engine https://review.opendev.org/640884 | 18:14 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Remove publish-openstack-manuals-developer-lang https://review.opendev.org/671118 | 18:14 |
*** goldyfruit has joined #openstack-infra | 18:22 | |
*** jamesmcarthur has quit IRC | 18:24 | |
*** weifan has quit IRC | 18:38 | |
*** weifan has joined #openstack-infra | 18:38 | |
*** jamesmcarthur has joined #openstack-infra | 18:42 | |
donnyd | does anyone know if there is a way to be specific in a test job. IE I want X flavor on Y provider? | 18:43 |
*** weifan has quit IRC | 18:43 | |
clarkb | there isnt | 18:44 |
clarkb | we can do provider specific flavors but generally avoid that to handle cloud outages | 18:44 |
*** artom has joined #openstack-infra | 18:47 | |
artom | clarkb, hey, continuing the conversation with donnyd, I'm the one who's driving for multi-NUMA-node flavors, in order to eventually test Nova NUMA-y in the gate (as opposed to 3rd party CI) | 18:48 |
artom | NUMA integration is a well known gap in Nova | 18:48 |
artom | Hopefully the use case for provider-specific flavors makes more sense :) | 18:49 |
Shrews | wow, that's the 2nd request for that feature today | 18:49 |
clarkb | artom: so the problem is openstack has a long standing policy of only gating on things if more than one cloud can provide the test resources. This is because we have a long hsitory of clouds disappearing | 18:49 |
artom | Shrews, oh yeah, who was the first? 'cuz this was talked about at Denver, so it's not new :) | 18:50 |
clarkb | artom: If we can have more than one cloud provide the required functionality then great but that won't be a provider specific lable that will be the can-run-numa label | 18:50 |
fungi | artom: specifically second case today where someone requested nodepool making decisions based on an intersection of node labels | 18:50 |
artom | clarkb, ah, yeah, that makes sense | 18:50 |
artom | clarkb, I know vexxhost have recently-ish turned on nested virt (needed for this stuff) | 18:51 |
clarkb | we can also reevaluate the policy, however as mentioned we have a long hsitroy of that policy being extremely worthwhile so I'm not sure I would vote to change it | 18:51 |
*** jamesmcarthur has quit IRC | 18:51 | |
artom | Hopefully multi-NUMA-node flavors are coming | 18:51 |
artom | clarkb, yeah, I see where you're coming from. We start gating on things, and them the cloud disappears and we're scrambling to "ungate" | 18:52 |
*** jamesmcarthur has joined #openstack-infra | 18:52 | |
fungi | not blocking changes on the availability of a node type available from only one provider saves the project from having to disable jobs so they can merge code any time that provider is offline | 18:52 |
artom | Though in this case, it would start as an experimental job triggered manually, so that might alleviate those concerns a bit | 18:52 |
*** jamesmcarthur_ has joined #openstack-infra | 18:53 | |
fungi | it's possible we could name the relevant node labels with a scary enough prefix that projects won't inadvertently add them to gate jobs? | 18:53 |
Shrews | fungi: use-me-and-your-project-is-deleted-ubuntu ? | 18:54 |
artom | ;_: | 18:54 |
*** jamesmcarthur_ has quit IRC | 18:54 | |
*** mattw4 has quit IRC | 18:55 | |
clarkb | ya if we want to start by proving it is feasible (as something we can take to other clouds potentially) I think we can set up a lable for that | 18:55 |
clarkb | then avoid gating on that (and remember in openstack clean check means check is effectively a gate too so can't be check either) | 18:56 |
clarkb | experimental jobs would be fine | 18:56 |
fungi | or non-voting check jobs | 18:56 |
fungi | or jobs in the "silent" pipeline (if we still have that) | 18:56 |
*** jamesmcarthur has quit IRC | 18:57 | |
clarkb | there exists and example in our nodepool configs with the vexxhost gpu labels fwiw | 18:57 |
*** jamesmcarthur has joined #openstack-infra | 18:58 | |
clarkb | https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.openstack.org.yaml#L48-L53 and https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.openstack.org.yaml#L234-L251 if you want to try setting up something similar with donnyd | 18:58 |
*** eernst has joined #openstack-infra | 18:59 | |
*** jamesmcarthur_ has joined #openstack-infra | 18:59 | |
clarkb | also infra meeting time in a minute or two in #openstack-meeting | 18:59 |
*** eernst has quit IRC | 19:00 | |
*** jamesmcarthur has quit IRC | 19:03 | |
*** jamesmcarthur_ has quit IRC | 19:04 | |
*** jamesmcarthur has joined #openstack-infra | 19:05 | |
*** weifan has joined #openstack-infra | 19:05 | |
*** jamesmcarthur_ has joined #openstack-infra | 19:06 | |
*** tdasilva has quit IRC | 19:06 | |
*** eharney has quit IRC | 19:08 | |
*** jamesmcarthur has quit IRC | 19:10 | |
*** weifan has quit IRC | 19:10 | |
*** gfidente is now known as gfidente|afk | 19:10 | |
*** diablo_rojo has quit IRC | 19:13 | |
*** iurygregory has quit IRC | 19:16 | |
*** jamesmcarthur_ has quit IRC | 19:25 | |
*** jamesmcarthur has joined #openstack-infra | 19:28 | |
*** jamesmcarthur_ has joined #openstack-infra | 19:29 | |
openstackgerrit | Merged opendev/system-config master: Translate gitea project creation to python https://review.opendev.org/670060 | 19:31 |
*** jamesmcarthur_ has quit IRC | 19:32 | |
mnaser | was there any issues in fortnebula-regionone by any chance? | 19:33 |
mnaser | 2019-07-16 14:05:14.312602 | 19:33 |
mnaser | http://logs.openstack.org/01/670601/2/gate/openstack-ansible-deploy-aio_metal-debian-stable/1c68e66/job-output.txt.gz -- Failed to connect to opendev.org at port 443: [Errno 0] Error | 19:33 |
*** jamesmcarthur_ has joined #openstack-infra | 19:33 | |
*** jamesmcarthur has quit IRC | 19:33 | |
*** jamesmcarthur has joined #openstack-infra | 19:34 | |
clarkb | mnaser: I'm not aware of any issues recently (and the most recent issues were job timeouts due to io contention) fn has really good network bw so maybe something related to being an ipv6 only cloud? | 19:35 |
mnaser | yeah i was thinking that might be a possiblity.. but we'd notice it more often if that was the case.. | 19:36 |
clarkb | fn is our only ipv6 only cloud currently | 19:36 |
clarkb | so if you don't get scheduled there often may not pop up? but ya I think we may need more debugging info | 19:37 |
*** jamesmcarthur_ has quit IRC | 19:37 | |
fungi | yeah, until limestone is back in the mix anyway | 19:39 |
mnaser | i mean | 19:41 |
mnaser | wouldn't it fail in PRE if it cant traceroute to opendev? | 19:42 |
mnaser | http://logs.openstack.org/01/670601/2/gate/openstack-ansible-deploy-aio_metal-debian-stable/1c68e66/zuul-info/zuul-info.debian-stretch.txt | 19:42 |
clarkb | not if the job breaks networking | 19:43 |
*** weifan has joined #openstack-infra | 19:44 | |
clarkb | we can also ask donnyd to look at it from the cloud side | 19:44 |
donnyd | sure, what do you want to know | 19:44 |
clarkb | donnyd: there was network connectivity error from test VM to opendev.org at http://logs.openstack.org/01/670601/2/gate/openstack-ansible-deploy-aio_metal-debian-stable/1c68e66/job-output.txt.gz#_2019-07-16_14_18_16_891952 that timestamp is utc | 19:45 |
clarkb | 8924a87c8b8cdccd2b2123c7736c34321baf3e23bada68cdbde7887e was the hypervisor host id for that job | 19:45 |
donnyd | does opendev have v6 records? | 19:46 |
clarkb | donnyd: yes | 19:46 |
*** ijw has quit IRC | 19:47 | |
donnyd | mnaser: I can take a look | 19:47 |
corvus | clarkb: i can't find where we install vhd-util on builders | 19:49 |
*** weifan has quit IRC | 19:49 | |
donnyd | I am still setting up full logging to capture things like this, but it would appear to me that it was pulling things from opendev.org to a certain point in the job, so not sure if its networking infra side related or not | 19:50 |
donnyd | clarkb: is there a way to sideload a job. Something like, i think job x isn't running correctly on cloud y | 19:51 |
beisner | hey all, just wondering if there is a meeting that we need to discuss a project-config change, or if that just happens organically? https://review.opendev.org/#/c/668681/ | 19:51 |
clarkb | beisner: usually asking here is sufficient, particularly if AJaeger has already +2'd it | 19:52 |
clarkb | donnyd: not easily no. We can manually boot an instance and run things on it | 19:53 |
clarkb | corvus: looking | 19:53 |
*** jamesmcarthur has quit IRC | 19:53 | |
*** jamesmcarthur has joined #openstack-infra | 19:53 | |
donnyd | looking at the logs ( node_provider:"fortnebula-regionone" AND message:"Failed to connect" ) I don't see any other jobs with that issue | 19:54 |
clarkb | corvus: opendev/puppet-diskimage_builder/manifests/init.pp | 19:54 |
*** jamesmcarthur_ has joined #openstack-infra | 19:54 | |
clarkb | corvus: we set the support_vhd flag to true | 19:54 |
*** petevg has joined #openstack-infra | 19:55 | |
corvus | oh i didn't check that one | 19:55 |
clarkb | corvus: beisner's https://review.opendev.org/#/c/668681/ chagne would be a good one to double test the gitea project creation changes when they are all in | 19:55 |
corvus | we'll have to remember that when we containerize | 19:55 |
clarkb | I'd approve it but figure you are paying attention to that so might be better if you review and approve it instead | 19:56 |
beisner | cool, thx peeps | 19:56 |
*** ijw has joined #openstack-infra | 19:57 | |
*** jamesmcarthur has quit IRC | 19:58 | |
fungi | i don't suppose anybody knows how to access a named ref via the gitea webui? | 20:00 |
clarkb | fungi: like a tag? | 20:00 |
*** eernst_ has joined #openstack-infra | 20:01 | |
*** mattw4 has joined #openstack-infra | 20:04 | |
openstackgerrit | Merged opendev/system-config master: Add some logging to repo creation https://review.opendev.org/670317 | 20:04 |
*** eharney has joined #openstack-infra | 20:06 | |
corvus | i suspect branches and tags may be the only refs you can browse in the gui | 20:07 |
corvus | earthquake! | 20:11 |
*** weifan has joined #openstack-infra | 20:15 | |
corvus | there it is: https://earthquake.usgs.gov/earthquakes/eventpage/nc73225421/executive | 20:16 |
fungi | clarkb: like refs/changes/41/671141/1 | 20:17 |
fungi | i want to persuade gitea to return raw content of a file with that ref (our election tooling used to do that with cgit, which allowed you to pass it in the h= parameter) | 20:18 |
fungi | er, a file at that ref | 20:18 |
fungi | fiddled with trying to pretend it was a branch, or a tag, or a commit | 20:21 |
*** kjackal has joined #openstack-infra | 20:23 | |
*** hamzy has quit IRC | 20:30 | |
*** bobh has quit IRC | 20:33 | |
*** ralonsoh has quit IRC | 20:35 | |
*** diablo_rojo has joined #openstack-infra | 20:37 | |
*** rfolco is now known as rfolco_l8r | 20:39 | |
*** jamesmcarthur_ has quit IRC | 20:45 | |
*** sgw has quit IRC | 20:45 | |
openstackgerrit | Merged opendev/system-config master: Run actual full project creation in gitea test https://review.opendev.org/670313 | 20:51 |
openstackgerrit | Merged opendev/system-config master: Improve idempotency of gitea-git-repos https://review.opendev.org/670919 | 20:51 |
*** eernst_ has quit IRC | 21:01 | |
*** pcaruana has quit IRC | 21:01 | |
*** dpawlik has quit IRC | 21:15 | |
*** kjackal has quit IRC | 21:16 | |
mordred | corvus: woot! project creation landed! | 21:22 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Provide better module return info from gitea create repos https://review.opendev.org/671159 | 21:27 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Parallelize repo creation by org https://review.opendev.org/671160 | 21:27 |
corvus | mordred, clarkb, fungi: ^ that's a little more complex, but i believe that is the absolute best we can do right now with the constraints around repo creation in gitea | 21:27 |
*** kjackal has joined #openstack-infra | 21:27 | |
*** joeguo has joined #openstack-infra | 21:27 | |
mordred | corvus: looking - also - I got 1 approval already on the gitea PR | 21:28 |
corvus | basically, i'm pretty sure that's as fast as we can create repos, and the settings are interleaved within that such that they take no extra time | 21:28 |
fungi | right on (to both of you) | 21:28 |
corvus | mordred: depending on how much you want to flex database muscles, we could root-cause the thing i noted in there about the settings table. or we could just say "databases are weird, <shrug>" :) | 21:29 |
corvus | (the query that gets hung is "delete from repo_unit where ...", and i'm assuming that just does something extra with locking if there are 2 transactions doing that on an empty table) | 21:29 |
*** jtomasek has quit IRC | 21:29 | |
*** gfidente|afk has quit IRC | 21:29 | |
corvus | oh, i forgot to give a time. that's 7.5m on my desktop | 21:30 |
mordred | uhg ... databases are weird shrug - although I almost want to dig in | 21:30 |
corvus | so might be a bit closer to 10m under zuul | 21:30 |
corvus | mordred: if you wanted to dig in, i'm guessing the next step would be a "show me the locks plz" command which i forgot? | 21:31 |
fungi | sometimes weird things are what make you want to dig in | 21:34 |
*** eernst has joined #openstack-infra | 21:35 | |
*** eernst has quit IRC | 21:35 | |
*** sgw has joined #openstack-infra | 21:38 | |
*** armax has quit IRC | 21:43 | |
*** kjackal has quit IRC | 21:46 | |
*** eernst has joined #openstack-infra | 21:51 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: Add gerrit to gitea job https://review.opendev.org/671162 | 21:54 |
*** pkopec has quit IRC | 21:54 | |
corvus | i'm pretty sure the step after that is to add a post pipeline job | 21:55 |
*** eernst has quit IRC | 21:55 | |
mordred | corvus: we're dangerously close to having a really awesome thing here | 21:55 |
*** xek has quit IRC | 22:02 | |
*** diablo_rojo has quit IRC | 22:13 | |
corvus | clarkb, fungi, mordred: can you look at my comments on https://review.opendev.org/651390 ? | 22:15 |
corvus | i think everything there is straightforward except the second comment on .zuul.yaml line 648. | 22:15 |
corvus | i'm not sure how we should proceed there (it turns out that the nameserver scenario may not be as simple as it first seems) | 22:16 |
*** weifan has quit IRC | 22:17 | |
*** weifan has joined #openstack-infra | 22:17 | |
*** weifan has quit IRC | 22:18 | |
*** weifan has joined #openstack-infra | 22:18 | |
*** weifan has quit IRC | 22:19 | |
*** weifan has joined #openstack-infra | 22:19 | |
corvus | one option would be to say that zone repos need to be under entirely opendev root control (eg drop zuul-maint from zuul-ci.org) | 22:19 |
*** weifan has quit IRC | 22:19 | |
*** weifan has joined #openstack-infra | 22:20 | |
corvus | another would be to ask others to please not pwn the system; however, i think the issue there is less with trust (i trust the people) than just having too much access (zuul-maint shouldn't have to worry about being a vector for compromising bridge) | 22:20 |
*** weifan has quit IRC | 22:20 | |
*** weifan has joined #openstack-infra | 22:20 | |
*** weifan has quit IRC | 22:21 | |
*** weifan has joined #openstack-infra | 22:21 | |
*** weifan has quit IRC | 22:22 | |
corvus | another would be to rethink how that job is run (eg, use a secret rather than an ssh key and use project-config to assign it to the zone repo's pipeline) | 22:22 |
*** weifan has joined #openstack-infra | 22:22 | |
*** weifan has quit IRC | 22:22 | |
*** weifan has joined #openstack-infra | 22:23 | |
corvus | another would be to come up with some way to tell zuul to let the zonefile repo borrow the ssh key of system-config just for that one job | 22:23 |
*** weifan has quit IRC | 22:23 | |
*** weifan has joined #openstack-infra | 22:23 | |
*** weifan has quit IRC | 22:24 | |
*** weifan has joined #openstack-infra | 22:24 | |
mordred | corvus: putting zone files under opendev resonates more with me, and I think the zuul project choosing to host itself with opendev carries with it an existing assumption that the opendev admins arent' going to misuse their position of trust and put a hostname in zuul-ci.org domain that shouldn't be there | 22:25 |
*** weifan has quit IRC | 22:25 | |
mordred | corvus: past that, most of the data in that file is actually opendev operational data | 22:25 |
corvus | that last one sounds hard, but it may not be -- i think we could say if a job had final:true and allowed-projects, then all of the allowed-projects get to borrow the ssh key of the defining repo. | 22:25 |
mordred | so I don't know that, as a zuul-maint, I'd have a context for approving or not approving a patch to the repo vs. as an opendev-core | 22:25 |
*** betherly has joined #openstack-infra | 22:26 | |
mordred | some of those words may be less relevant than others - those were just my first thoughts | 22:26 |
corvus | mordred: well, "blog.zuul-ci.org CNAME wordpress.com" might be the sort of thing opendev doesn't care about, but that seems minor; i generally agree. | 22:26 |
mordred | you said wordpress | 22:27 |
corvus | i said blog | 22:27 |
corvus | option 1 does have the advantage of being something we can do immediately | 22:27 |
mordred | I think it also describes the actual current state more correctly | 22:28 |
mordred | BUT - I could be swayed into supporting different positions - this isn't strongly or deeply held conviction or anything | 22:29 |
corvus | sure, i'm just like to have the option of a system where semi-overlapping groups can cooperate; we have that now (we trust zuul-maint and opendev to make changes to the zone) | 22:30 |
corvus | it sort of seems a shame to drop that because of a technical issue that isn't actually related to dns | 22:30 |
*** betherly has quit IRC | 22:31 | |
corvus | i'd love for the openstack project to have the option to use the system; and they have lots of hostnames that aren't opendev | 22:31 |
*** whoami-rajat has quit IRC | 22:31 | |
corvus | i think options #1 and #4 are the most actionable; #2 is right out, #3 is meh -- i'd like to be able to use the ssh key system, so i'd rather make it better than give up on it. | 22:34 |
corvus | so maybe we go with #1, and i backlog implementing #4 so we can use it if we want later. | 22:34 |
corvus | mordred: ^? | 22:34 |
mordred | corvus: ++ | 22:36 |
*** rcernin has joined #openstack-infra | 22:36 | |
corvus | #4 would actually let us drop the project-config key too; it's kinda growing on me | 22:38 |
fungi | commented, but i also think that projects hosting their domains with opendev ought to be able to take advantage of our familiarity with bind and dns to okay their zonefile changes (even if we have jobs which at least catch outright breakage). it does turn us into a bottleneck for those though, and i agree it would be nice if teams were able to self-approve emergency dns changes for resources we're | 22:39 |
fungi | not hosting in cases where none of us are around to review | 22:39 |
*** ijw has quit IRC | 22:39 | |
*** ijw has joined #openstack-infra | 22:43 | |
corvus | i left a followup too. | 22:43 |
corvus | clarkb: ^ if you agree with all that, i think we've got next steps on that one. | 22:43 |
* mordred dinners | 22:45 | |
corvus | regarding adding gerrit to the gitea job... i'm wondering if maybe it's not necessary... | 22:45 |
corvus | right now, we are running the gerrit/gitea playbook without the gerrit host, and that's fine because the gerrit plays just aren't running since there are no matching hosts | 22:46 |
corvus | so we could make a second job just for gerrit, since there isn't really any interaction there (unless we wanted to go so far as test replication from our fake gerrit to our fake gitea) | 22:47 |
corvus | we would trust that as long as the two jobs worked separately, a single job which ran the playbook with all the hosts would work | 22:47 |
corvus | i think either approach would be worth trying (one combined test job or two split test jobs); i'm not sure which would be better | 22:48 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Add gerrit to gitea job https://review.opendev.org/671162 | 22:48 |
*** tkajinam has joined #openstack-infra | 22:53 | |
*** slaweq has quit IRC | 22:55 | |
*** betherly has joined #openstack-infra | 22:56 | |
*** armax has joined #openstack-infra | 22:59 | |
*** ekultails has quit IRC | 23:00 | |
clarkb | corvus: fungi couldnt we use asecret in a job then let repos run that job? | 23:01 |
*** betherly has quit IRC | 23:01 | |
clarkb | then they dont have access to change the secret or the job but do have access to trugger things to update | 23:01 |
*** weifan has joined #openstack-infra | 23:02 | |
fungi | clarkb: yeah, that was one of the options corvus mentioned above | 23:05 |
fungi | "another would be to rethink how that job is run (eg, use a secret rather than an ssh key and use project-config to assign it to the zone repo's pipeline)" | 23:06 |
corvus | and "#3 is meh -- i'd like to be able to use the ssh key system, so i'd rather make it better than give up on it." is my current feeling on that | 23:07 |
*** weifan has quit IRC | 23:07 | |
*** aaronsheffield has quit IRC | 23:07 | |
*** goldyfruit has quit IRC | 23:08 | |
clarkb | in that situation opendev controls both the key andjob andbasically lets another repo trigger it? | 23:09 |
fungi | yeah | 23:10 |
*** weifan has joined #openstack-infra | 23:14 | |
clarkb | ya that would work | 23:16 |
*** diablo_rojo has joined #openstack-infra | 23:23 | |
*** hamzy has joined #openstack-infra | 23:27 | |
*** betherly has joined #openstack-infra | 23:29 | |
*** betherly has quit IRC | 23:34 | |
*** tobiash has quit IRC | 23:51 | |
*** tobiash has joined #openstack-infra | 23:52 | |
*** goldyfruit has joined #openstack-infra | 23:53 | |
*** dchen has joined #openstack-infra | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!