*** DSpider has quit IRC | 00:20 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add a grafana/grafyaml image https://review.opendev.org/737397 | 00:29 |
---|---|---|
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add a grafana/grafyaml image https://review.opendev.org/737397 | 00:30 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add a grafana/grafyaml image https://review.opendev.org/737397 | 00:33 |
*** xiaolin has joined #opendev | 00:43 | |
*** sgw1 has quit IRC | 01:35 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 01:42 |
*** sgw1 has joined #opendev | 01:50 | |
*** mrunge_ has joined #opendev | 02:20 | |
*** mrunge has quit IRC | 02:21 | |
*** sgw1 has quit IRC | 02:47 | |
*** shtepanie has quit IRC | 02:53 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 03:02 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 03:05 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 03:11 |
*** sgw1 has joined #opendev | 03:28 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 03:34 |
*** meiyanzheng has joined #opendev | 03:44 | |
*** diablo_rojo has quit IRC | 03:58 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 04:03 |
*** sgw1 has quit IRC | 04:06 | |
*** ykarel|away is now known as ykarel | 04:10 | |
*** sgw1 has joined #opendev | 04:18 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 04:23 |
*** rchurch has quit IRC | 04:23 | |
*** rchurch has joined #opendev | 04:24 | |
auristor | ianw: with regards to the unexplained afs01.dfw traffic, what the graphs do not separate is traffic by service. Therefore it is not known if the outbound traffic is fileserver to cache manager or volserver to volserver | 04:42 |
auristor | but if you compare the afs02.dfw graph for the same time period http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=6405&rra_id=5&view_type=&graph_start=1592857482&graph_end=1592858713&graph_height=120&graph_width=500&title_font_size=12 | 04:43 |
auristor | there is only 5m of traffic at the end of the time period. that is most likely the volserver traffic from afs01.dfw to afs02.dfw. The rest is most likely fileserver to cache manager traffic. Either the rsync client fetching status and data or cache managers that read from /afs fetching new data after receiving volume callbacks. | 04:46 |
ianw | auristor: probably what we should do is combine all the exit codes of the rsyncs for each distro, and only do a vos release if it reports changes were made | 05:00 |
ianw | googling that, it appears the only way to tell if rsync did something of interest is log scraping of various types, so possible, but not as neat as just exit codes | 05:02 |
ianw | still, an item for the todo list | 05:03 |
*** ysandeep|PTO is now known as ysandeep | 05:18 | |
openstackgerrit | Felix Edel proposed zuul/zuul-jobs master: [DNM] Test upload return values https://review.opendev.org/737441 | 05:51 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 06:19 |
openstackgerrit | Felix Edel proposed zuul/zuul-jobs master: [DNM] Test upload return values https://review.opendev.org/737441 | 06:19 |
*** rpittau|afk is now known as rpittau | 06:34 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 06:58 |
openstackgerrit | Felix Edel proposed zuul/zuul-jobs master: Return upload_results in upload-logs-swift role https://review.opendev.org/733564 | 07:02 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 07:07 |
AJaeger | ianw, fungi, clarkb, could you review https://review.opendev.org/#/c/735283 together with this etherpad, please? https://etherpad.opendev.org/p/-CBx0IaMT37oFBHdt8iV | 07:15 |
AJaeger | Please update the etherpad with your thoughts and let's see how to make this more consistent | 07:16 |
AJaeger | infra-root, codesearch still finds x/whitebox-tempest-plugin but that was renamed to openstack/whitebox-tempest-plugin | 07:23 |
*** calcmandan has quit IRC | 07:26 | |
*** hashar has joined #opendev | 07:26 | |
*** calcmandan has joined #opendev | 07:29 | |
*** tosky has joined #opendev | 07:35 | |
*** aannuusshhkkaa has quit IRC | 07:51 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
openstackgerrit | Javier Peña proposed opendev/system-config master: Support CentOS for AFS mirror https://review.opendev.org/736996 | 08:06 |
*** hashar has quit IRC | 08:16 | |
*** hashar_ has joined #opendev | 08:16 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 08:20 |
*** ykarel is now known as ykarel|lunch | 08:23 | |
*** hashar_ is now known as hashar | 08:25 | |
*** ysandeep is now known as ysandeep|lunch | 08:37 | |
openstackgerrit | Vishal Manchanda proposed openstack/project-config master: Upadting horizon nodejs job name https://review.opendev.org/737457 | 08:37 |
*** ysandeep|lunch is now known as ysandeep | 08:50 | |
openstackgerrit | Vishal Manchanda proposed openstack/project-config master: Upadting horizon nodejs job name https://review.opendev.org/737457 | 08:52 |
openstackgerrit | Riccardo Pittau proposed openstack/diskimage-builder master: Convert multi line if statement to case https://review.opendev.org/734479 | 08:54 |
*** priteau has joined #opendev | 08:57 | |
*** SotK has quit IRC | 09:08 | |
*** owalsh has quit IRC | 09:08 | |
*** tobiash has quit IRC | 09:08 | |
*** owalsh_ has joined #opendev | 09:08 | |
*** SotK has joined #opendev | 09:08 | |
*** sshnaidm|afk is now known as sshnaidm|ruck | 09:11 | |
*** tobiash has joined #opendev | 09:12 | |
*** hashar is now known as hasharAway | 09:31 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 09:32 |
*** ryohayakawa has quit IRC | 10:06 | |
*** meiyanzheng has quit IRC | 10:07 | |
*** rpittau is now known as rpittau|bbl | 10:12 | |
*** hasharAway is now known as hasharLunch | 10:16 | |
*** tkajinam has quit IRC | 10:17 | |
*** jhesketh has quit IRC | 10:36 | |
*** jhesketh has joined #opendev | 10:37 | |
*** mrunge_ is now known as mrunge | 10:48 | |
*** ykarel|lunch is now known as ykarel | 10:58 | |
frickler | mnaser: you have very strange rDNS in place for mirror01.ca-ymq-1.vexxhost.opendev.org: 149.45.204.199.in-addr.arpa domain name pointer abla-4.albalisaude2.com.br. | 11:11 |
*** DSpider has joined #opendev | 11:45 | |
*** hasharLunch is now known as hashar | 11:51 | |
*** lpetrut has joined #opendev | 11:51 | |
*** ysandeep is now known as ysandeep|afk | 12:00 | |
*** rpittau|bbl is now known as rpittau | 12:14 | |
smcginnis | What would be the best ensure-* role to use to make sure setuptools is present for calling "python setup.py --name"? | 12:19 |
AJaeger | smcginnis: ensure-pip includes setuptools | 12:21 |
smcginnis | AJaeger: Any idea why it would fail with that role? We have this pre playbook that runs: | 12:23 |
smcginnis | pip | 12:23 |
smcginnis | https://pypi.org/project/cinder/ | 12:23 |
smcginnis | But got this failure: https://060ba5bdde00663d768c-19780c33aa00a3c0d825d79cd8c225b0.ssl.cf2.rackcdn.com/8ea79a3dba07789f8ab08066faa90cbfcc8a7837/release/propose-update-constraints/51a2e0c/job-output.txt | 12:23 |
smcginnis | Search for "2020-06-19 14:40:03.562345" | 12:24 |
smcginnis | Hmm, 2020-06-19 14:39:09.485384 | TASK [ensure-pip : Ensure setuptools] | 12:25 |
smcginnis | Oh, skipping: Conditional result was False | 12:26 |
smcginnis | That's only on Suse. | 12:26 |
*** ysandeep|afk is now known as ysandeep | 12:28 | |
AJaeger | smcginnis: that's from 19th of June - shouldn't this be fixed by now? | 12:29 |
AJaeger | smcginnis: https://zuul.opendev.org/t/openstack/builds?job_name=propose-update-constraints# all looks green | 12:29 |
AJaeger | smcginnis: So, let's ignore this specific failure - ok? | 12:30 |
smcginnis | AJaeger: Oh, I think you are right. It was probably shortly after then that it was fixed. I'm just looking at too old of logs. | 12:30 |
AJaeger | hope so ;) | 12:30 |
smcginnis | Yeah, I was wondering why I had seen some successful runs. That would make sense. | 12:30 |
smcginnis | We have a nightly job that runs that has failed, but I think that uses a different playbook that probably also needs an update. | 12:31 |
smcginnis | Will have to track that down though. | 12:31 |
smcginnis | Thanks for looking AJaeger. | 12:31 |
AJaeger | ok, let's fix the nightly one ;) | 12:32 |
AJaeger | smcginnis: failure on cinder? | 12:32 |
AJaeger | smcginnis: just checked, didn't find one... | 12:35 |
smcginnis | AJaeger: It's on openstack/requirements. Looks like propose-updates and release-wheel-cache: | 12:35 |
smcginnis | https://zuul.opendev.org/t/openstack/builds?pipeline=periodic&branch=master&project=openstack%2Frequirements | 12:35 |
smcginnis | release-wheel-cache is: The task includes an option with an undefined variable. The error was: 'afs_volume' is undefined | 12:36 |
smcginnis | But propose-updates looks like it's missing virtualenv. | 12:36 |
AJaeger | indeed | 12:36 |
smcginnis | Which is a little odd, since I see it being installed prior to the failure. | 12:37 |
smcginnis | Maybe just needs the bit to make it global? | 12:37 |
AJaeger | it worked before - infra-root, did we update something in the last 24h to break release-wheel-cache? | 12:38 |
AJaeger | See https://zuul.opendev.org/t/openstack/build/229fc8ce6db3464cbb9f50d0e3ed43a1 | 12:38 |
AJaeger | smcginnis: I don't see anything obvious - that's why I asked for help ^ | 12:39 |
smcginnis | That one does look like it could be a side effect of another change. | 12:40 |
AJaeger | smcginnis: virtualenv is called from the script - and thus needs to be available globally | 12:40 |
AJaeger | is the shell script invoked without the venv in PATH? | 12:41 |
smcginnis | That would be my guess. | 12:42 |
AJaeger | we have the ensure_global flag only for tox, not for virtualenv ;( | 12:43 |
AJaeger | mordred: any idea what do do here? ^ | 12:43 |
smcginnis | I wonder if it would work to just update this part to call "tox -e venv"? https://opendev.org/openstack/project-config/src/branch/master/playbooks/proposal/propose_update.sh#L35-L38 | 12:44 |
AJaeger | mmh, might - sorry, have to step out for a bit now. Hope others can help further | 12:47 |
smcginnis | Thanks | 12:47 |
frickler | AJaeger: smcginnis: this is a role invocation without the var added, I'll push a fix https://opendev.org/openstack/project-config/src/branch/master/playbooks/wheel/release.yaml#L13 | 12:48 |
smcginnis | Thanks frickler. | 12:48 |
openstackgerrit | Jens Harbott (frickler) proposed openstack/project-config master: Fix wheel release playbook https://review.opendev.org/737525 | 12:50 |
frickler | infra-root: ^^ the lastest commit was https://opendev.org/openstack/project-config/commit/92b378cc9e2ac95cf24520d1dc73986060c7ecfb which didn't touch this, maybe this is also ansible 2.9 fallout? | 12:51 |
smcginnis | Looks like that was an error in there for a couple months looking at the git blame. | 12:52 |
smcginnis | Maybe we just didn't notice that one until now. | 12:52 |
openstackgerrit | Sean McGinnis proposed openstack/project-config master: Fix venv use in requirements propose_update.sh https://review.opendev.org/737526 | 12:54 |
smcginnis | Would be good to know if that should work ^ | 12:54 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: ensure-pip debian: update package lists https://review.opendev.org/737529 | 12:57 |
*** Dmitrii-Sh has quit IRC | 13:04 | |
*** Dmitrii-Sh has joined #opendev | 13:05 | |
openstackgerrit | Thierry Carrez proposed zuul/zuul-jobs master: upload-git-mirror: check after mirror operation https://review.opendev.org/737533 | 13:05 |
mordred | smcginnis: that might work - we might also need to add an ensure_global flag to ensure-virtualenv too | 13:17 |
fungi | problem with that is that ensure-venv may install the venv module and set a variable saying to run `python3 -m venv` | 13:19 |
mnaser | have we by any chance made any recent changes to base jobs? | 13:19 |
mnaser | it doesn't seem that way but i just had a no-log POST_FAILURE | 13:19 |
fungi | mordred: so just a simple symlink won't work for that case , might need a wrapper script? | 13:19 |
mnaser | over here: https://review.opendev.org/#/c/733024/5 | 13:20 |
fungi | brb | 13:20 |
mordred | fungi: ugh | 13:20 |
mnaser | oh there maybe something else transpiring here | 13:20 |
mnaser | i'm seeing a bunch of other POST_FAILUREs | 13:20 |
mnaser | https://zuul.opendev.org/t/openstack/builds | 13:20 |
mnaser | an object storage provider having problems maybe? i can't help troubleshoot more because we'll need some zuul executors logs to help uncover this i think | 13:21 |
mordred | fungi: no, I donm't think that's right ... the role is ensure-virtualenv | 13:21 |
mordred | mnaser: looking | 13:21 |
mnaser | mordred: feel free to paste a log and i can go digging after too, seems like you're dealing with another fire :) | 13:22 |
frickler | mnaser: mordred: seems some other jobs are affected, too, likely an issue with one s3 provider | 13:22 |
frickler | like https://zuul.opendev.org/t/openstack/build/1be32e52753c401caaa853179dd19826 | 13:22 |
mordred | mnaser: http://paste.openstack.org/show/795098/ | 13:25 |
mordred | keystone v2 not available anymore :) | 13:25 |
mnaser | ah bon | 13:26 |
mnaser | i think i can fix that | 13:26 |
mordred | mnaser: I think we just need to supply default as domain name: http://travaux.ovh.net/?do=details&id=42179 | 13:28 |
mnaser | mordred: identity_api_version: 3 too or that's optional? | 13:28 |
mordred | mnaser: optional | 13:28 |
mordred | we should detect based on presence of domain parameters | 13:29 |
mordred | hah. this is already fixed in openstacksdk ... I guess we have an old version there? | 13:30 |
mordred | how could we have an old version - these are running in the containers | 13:30 |
* mordred goes to look | 13:30 | |
openstackgerrit | Mohammed Naser proposed opendev/base-jobs master: ovh: start using keystone v3 https://review.opendev.org/737540 | 13:31 |
mnaser | mordred, infra-core: ^ see above to avoid a flood of post-failures incoming | 13:31 |
mordred | oh - no it's not | 13:31 |
mordred | mnaser: can you do gra too? | 13:32 |
mnaser | mordred: https://github.com/openstack/openstacksdk/blob/master/openstack/config/vendors/ovh.json wat ? | 13:32 |
mordred | oh - nevermind | 13:32 |
mnaser | :) | 13:32 |
mnaser | mordred: i think this is because this is the openstacksdk version inside zuul's ansible | 13:32 |
mnaser | mordred: and we should probably double check nodepool's config also has the right info too and is using v3 too, dont think i can look at that | 13:33 |
mnaser | uhh, the profile has said identity_api_version 3 since dec 2019 | 13:34 |
mordred | yeah. the issue here is missing domain attributes - identity_api_version doesn't actually control this | 13:34 |
mnaser | ah | 13:34 |
mordred | it's controlled by auth_type | 13:34 |
mordred | but auth_type of password does inferrance based on parameters | 13:34 |
*** sgw1 has quit IRC | 13:34 | |
mordred | BU?T | 13:34 |
mordred | the ovh profile has user_domain_name: Default and project_domain_name: Default | 13:35 |
mordred | so I'm guessing that yeah, this is an issue with the ansible venv installs | 13:35 |
mordred | all the more reason to get these bad-boys on to docker | 13:35 |
*** sgw1 has joined #opendev | 13:35 | |
mordred | mnaser: thanks for the patch! | 13:36 |
mnaser | oh yeah we have to check the venv there | 13:36 |
mordred | corvus: I think we should maybe finish rolling out zuul-executor docker | 13:36 |
mnaser | mordred: do you feel like pip freeze there just to check if our theory is right? im just curious. | 13:37 |
mordred | mnaser: sure. one sec | 13:37 |
mordred | >>> openstack.version.__version__ | 13:43 |
mordred | '0.41.0' | 13:43 |
mordred | mnaser: ^^ yup | 13:44 |
mordred | also - I think we have a bug in zuul-manage-ansible | 13:44 |
fungi | mnaser: catching up, but today is probably the long-announced date that ovh was dropping v2 | 13:49 |
mordred | yup | 13:51 |
mordred | fungi: and - sdk has support for it - so all of the work was actually done to make it seamless... | 13:51 |
mordred | fungi: EXCEPT | 13:51 |
mordred | our virtualenvs have a stale version of sdk | 13:51 |
fungi | we don't continuously upgrade our ansible venvs on the executors, yep | 13:51 |
*** sshnaidm|ruck is now known as sshnaidm|afk | 13:51 | |
fungi | stale versions of everything last i checked | 13:52 |
* mordred would like us to finish moving zuul-executors to docker | 13:52 | |
fungi | when zuul adds a new ansible version, manage-ansible creates the new venv for it, but we never upgrade ansible or anything else in it (or didn't last time i looked, though it's been a couple months so maybe that's gotten better) | 13:52 |
mnaser | i dont know if zuul/nodepool can do this but it would be an interesting exercise of being able to push zuul-executors out to the clouds themselves | 13:53 |
mnaser | i wonder how much of a performance impact that would have given the reduced RTT for running ansible | 13:53 |
mnaser | one could theoretically profile something like this | 13:53 |
mnaser | hmm. | 13:54 |
mordred | mnaser: we have support for cloud-region tied zuul-executors | 13:54 |
mnaser | mordred, AJaeger, fungi: https://review.opendev.org/#/c/737540/ has failed in POST_FAILURE | 13:54 |
mordred | sigh | 13:55 |
mnaser | i guess its because base-jobs is a config-project and its not speculatively testing the secret, right? | 13:55 |
mordred | yah | 13:55 |
mordred | infra-root: it might be worth force-merging mnaser's patch | 13:55 |
mnaser | yeah so someone probably should force-merge that.. hoping i didn't do something bad :) | 13:55 |
mordred | since it's a config-project fix for a gate failure that is itself hitting the gate failure | 13:56 |
mordred | infra-root: I'm ready to force-merge unless someone thinks we shouldn't | 13:56 |
corvus | mordred: +1 | 13:57 |
fungi | please do, i'm just getting settled back in | 13:57 |
fungi | but i agree it's a fine stopgap | 13:57 |
openstackgerrit | Merged opendev/base-jobs master: ovh: start using keystone v3 https://review.opendev.org/737540 | 13:57 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: ensure-pip debian: update package lists https://review.opendev.org/737529 | 13:58 |
*** DSpider has quit IRC | 13:58 | |
mordred | infra-root: if folks could review https://review.opendev.org/#/c/733967/ - I can put the executors into the emergency file and do a one-at-a-time rollout | 14:00 |
*** roman_g has joined #opendev | 14:01 | |
corvus | mordred: lgtm | 14:01 |
mordred | corvus: think maybe I should go ahead and put them in emergency? | 14:04 |
* mnaser wonders if we should wait to double check ovh is back ok again | 14:04 | |
mnaser | (btw maybe we should throw a notice too) | 14:05 |
mordred | mnaser: bah. let people be confused | 14:07 |
mordred | it adds excitement | 14:07 |
mnaser | fun ride ahead :) | 14:10 |
corvus | mordred: yeah, i emergency now sounds reasonable | 14:11 |
mnaser | oh yeah there's a lot of post_failure's | 14:12 |
mnaser | im trying to find a successful ovh job to make sure what we did works | 14:12 |
mnaser | we merged at 9:57 am et, so ~16 minutes ago, trying to find a job that completed since | 14:14 |
mnaser | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6f0/737497/2/check/puppet-openstack-syntax-5-ubuntu-bionic/6f0554c/ | 14:14 |
mnaser | that was uploaded 6 minutes ago | 14:15 |
mordred | mnaser: woot! | 14:15 |
mnaser | so i think we're good | 14:15 |
mnaser | kindof a bummer about all the orange in the status, but we can't really know to rerun those jobs, they'll just have to fail | 14:15 |
openstackgerrit | Monty Taylor proposed opendev/base-jobs master: Revert "ovh: start using keystone v3" https://review.opendev.org/737550 | 14:16 |
mnaser | i think we _could_ perhaps technically add a retry on the upload job using another cloud to make it more resiliant | 14:16 |
mordred | mnaser: that's potentially not a terrible idea | 14:16 |
mordred | mnaser: I pushed up a revert but -2'd it - once we've rolled out the executor update we can land it | 14:17 |
mnaser | the trickiest bit i guess is just the 'figure out where you failed to upload and try something 8except* that' | 14:17 |
mnaser | i wonder if also we could make the upload-logs failing not fail the job if the job passed | 14:19 |
*** DSpider has joined #opendev | 14:20 | |
mnaser | could be really useful for promote/post/release failures... and just in general, if it passed and logs failed, we're probably ok | 14:20 |
mordred | hrm. that's an interesting thought | 14:20 |
mnaser | like maybe we can report a zuul warning saying we failed to push logs | 14:20 |
mnaser | in the comment | 14:20 |
mnaser | but no need for us to fail the whole thing | 14:20 |
mordred | yeah. bceause most of the time you only care about logs when a job failed anyway. | 14:21 |
mnaser | yeah | 14:21 |
mordred | corvus: ^^ what do you think about that? | 14:21 |
mordred | mnaser: or - perhaps have that behavior be dependent on type of pipeline or something. like - make it be a failure in check - but in release/post - make it not | 14:22 |
mordred | especially if you consider a release job that does some other thing (like, actually does the release) - that may have dependent jobs that should be triggered - but the log failure would cause the dependent jobs to not trigger even though a released artifact would have been published/pushed | 14:23 |
openstackgerrit | Merged openstack/project-config master: Fix wheel release playbook https://review.opendev.org/737525 | 14:24 |
corvus | i worry a little bit about not noticing a failure, so making it pipeline-contingent may be a good compromise there. but honestly, do we really want jobs to finish, or follow-on jobs to run without logs? | 14:24 |
corvus | i mean, a build without a log is *really really bad* | 14:24 |
corvus | especially in release | 14:25 |
mordred | corvus: maybe mnaser's first thought - re-try log upload on a different target - would be a better thing to work on first? | 14:25 |
corvus | yeah, i think that may be better | 14:25 |
* mnaser thinks this would be nice to integrate to the zuul role itself | 14:26 | |
mordred | I'm torn between which is worse - release job without logs - but that already did its release actions | 14:26 |
mnaser | mordred: well, the note here is it's not a release job without logs | 14:26 |
mnaser | its a release job without public facing logs, the executor still technically has a copy | 14:26 |
mnaser | its "not ideal and easy to access" but it's there | 14:26 |
corvus | no, they're deleted | 14:27 |
mordred | yeah | 14:27 |
corvus | i mean, there's some ansible logs | 14:27 |
mordred | but they won't have the logs of the release | 14:27 |
mnaser | ok right, so we log the error but not the progress of the whole thing | 14:27 |
corvus | so yeah, we might be able to figure out if something uploaded, but no guarantees | 14:27 |
mnaser | well unless we can grab the status of the 'run' phase and then you have a guarantee that it passed or didnt | 14:28 |
corvus | (it all depends on how much of that info makes it to the executor debug log) | 14:28 |
mordred | corvus: how horrible of an idea would it be to add a $something to let us tell the executor to not clean a build dir | 14:28 |
corvus | mordred: horrible :) | 14:28 |
mnaser | if run passed and upload-logs failed, you know the run phase _should_ be complete | 14:28 |
corvus | mordred: we'd never clean them up | 14:28 |
mordred | corvus: yeah. was mostly thinking about how to not delete logs that didn't actually get uploaded | 14:29 |
corvus | i think we should first exhaust solutions that make the system work more reliably :) | 14:29 |
mordred | ++ | 14:29 |
corvus | so retry-other-provider has my vote for focus | 14:29 |
fungi | also keep in mind that retrying the upload to another endpoint wouldn't necessarily have helped here, as all the ovh endpoints stopped being writeable for us ~simultaneously, so the job could easily have tried another which also failed | 14:30 |
mnaser | ah, upload-logs-swift takes one cloud only, not the whole list | 14:30 |
mnaser | if only we wrote it to take a list, it would have been easier to make it better for everyone without affecting too much | 14:30 |
corvus | well, we can update the opendev usage, and if we get that solid, we can look at upgrading the role | 14:34 |
mnaser | corvus: well, i was thinking of updating the role to accept a list or string (which it converts to a list) which shouldn't change user-facing behaviour first | 14:35 |
mnaser | and then adding the 'fallback' stuff as stage 2, cause really no one should have been feeding that thing a list in the first place | 14:35 |
corvus | mnaser: hrm, actually, i wonder if it would be better outside of the role? if you put it in the swift role, then a user can only retry to other swifts. but someone might want to retry swift/gce/aws | 14:38 |
*** mlavalle has joined #opendev | 14:39 | |
*** sshnaidm|afk is now known as sshnaidm|ruck | 14:39 | |
mnaser | corvus: i like the idea of making it in a way so that all of our (zuul) users can get the increased benefit of 'reliability'. in my experience base-job improvements will generally not be picked up by most users later | 14:39 |
*** ysandeep is now known as ysandeep|afk | 14:39 | |
mnaser | maybe eventually we could have a upload-object-store that takes a specific type of stringset which can do all sort of object storages | 14:39 |
corvus | mnaser: i don't disagree with you | 14:40 |
corvus | and yes, i'm suggesting that a meta-role may be the most appropriate here | 14:40 |
corvus | mnaser: note that if we change the swift role, we will also need to change the gce role, because we're maintaining parity between them. | 14:41 |
mnaser | corvus: yeah that seems fair, i'll probably have some questions (i guess the weird thing here is.. do you upload to another 'gce region'?) -- i'll have some questions | 14:42 |
corvus | mnaser: i would suggest starting with the meta-role idea; it seems to me that would be the easiest as well as most robust and future-proof approach | 14:44 |
clarkb | hrm I thought we switched to v3 across the board for ovh months ago. But I guess that was only in the control plane side of things | 14:45 |
clarkb | as far as making this more robust pabelanger suggseted similar in the past but the way ansible works makes it difficult | 14:45 |
clarkb | because we want to do successive random choices | 14:45 |
clarkb | excluding previous options | 14:45 |
*** ysandeep|afk is now known as ysandeep | 14:46 | |
corvus | yeah, it may need to be a custom module? | 14:46 |
clarkb | yes I think so | 14:46 |
*** priteau has quit IRC | 14:52 | |
*** owalsh_ is now known as owalsh | 15:07 | |
*** _mlavalle_1 has joined #opendev | 15:23 | |
*** mlavalle has quit IRC | 15:25 | |
*** ykarel is now known as ykarel|afk | 15:33 | |
*** ysandeep is now known as ysandeep|away | 15:36 | |
*** lpetrut has quit IRC | 15:44 | |
openstackgerrit | Sean McGinnis proposed openstack/project-config master: Retire dragonflow project https://review.opendev.org/737566 | 15:49 |
*** ykarel|afk is now known as ykarel | 15:55 | |
*** rpittau is now known as rpittau|afk | 16:00 | |
*** diablo_rojo has joined #opendev | 16:02 | |
openstackgerrit | Merged openstack/project-config master: Fix venv use in requirements propose_update.sh https://review.opendev.org/737526 | 16:04 |
*** sgw1 has quit IRC | 16:13 | |
*** sgw1 has joined #opendev | 16:19 | |
AJaeger | infra-root, we retired puppet-congress but there's still an open review - can you abandon that? https://review.opendev.org/#/q/project:openstack/puppet-congress+is:open Or leave as is? | 16:28 |
clarkb | AJaeger: I have an abandon button | 16:28 |
clarkb | AJaeger: do you hvae a link to the retirement change? I can put that in the abandon message | 16:29 |
AJaeger | clarkb: http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015555.html | 16:29 |
AJaeger | clarkb: thanks | 16:29 |
AJaeger | But that means gerrit did not make the repo readonly? | 16:30 |
*** ykarel is now known as ykarel|away | 16:30 | |
AJaeger | oh, it's readonly, see it... | 16:31 |
clarkb | ya its read only now. maybe there was a race between switching to RO and the change being pushed | 16:31 |
AJaeger | that change was overlooked when retiring ;( it was a year old | 16:31 |
AJaeger | thanks, clarkb | 16:31 |
openstackgerrit | Sean McGinnis proposed openstack/project-config master: Remove dragonflow from infra https://review.opendev.org/737578 | 16:32 |
clarkb | oh thats a 2019 not 20202 | 16:32 |
AJaeger | yep | 16:32 |
openstackgerrit | Sean McGinnis proposed openstack/project-config master: Remove dragonflow from infra https://review.opendev.org/737578 | 16:37 |
openstackgerrit | Merged openstack/project-config master: Stop translation stable branches on projects without Dashboard https://review.opendev.org/723217 | 16:49 |
tosky | I lost tracko of the status: can we recheck the jobs which failed with POST_FAILURE? | 16:55 |
clarkb | tosky: yes | 16:56 |
*** sgw1 has quit IRC | 16:56 | |
clarkb | tosky: any job which started in the last 3 hours should have the fix | 16:56 |
clarkb | (roughly) | 16:56 |
tosky | clarkb: thanks! | 16:56 |
fungi | tosky: in short, one of the swift providers to whom we upload build logs dropped keystone v2 api support today, and we were using too old openstacksdk in our ansible venvs on the zuul executors to do proper version discovery | 16:57 |
*** sgw1 has joined #opendev | 17:05 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:14 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:15 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:17 |
*** sshnaidm|ruck is now known as sshnaidm|afk | 17:24 | |
*** factor has quit IRC | 17:25 | |
*** hashar has quit IRC | 17:25 | |
*** hashar has joined #opendev | 17:25 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 17:33 |
*** sgw1 has quit IRC | 17:34 | |
*** sgw1 has joined #opendev | 17:50 | |
*** xiaolin has quit IRC | 17:59 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 18:17 |
*** roman_g has quit IRC | 18:21 | |
*** hashar is now known as hasharAway | 18:23 | |
AJaeger | ianw, fungi, clarkb, could you review https://review.opendev.org/#/c/735283 together with this etherpad, please? https://etherpad.opendev.org/p/-CBx0IaMT37oFBHdt8iV - that for the python-jobs. Please update etherpad | 18:28 |
openstackgerrit | Rafael Folco proposed openstack/diskimage-builder master: DNM: Debug py3 on dib 7 https://review.opendev.org/736421 | 18:40 |
openstackgerrit | Merged zuul/zuul-jobs master: Simplify twine invocation for PyPI uploads https://review.opendev.org/735932 | 19:01 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 19:02 |
*** aannuusshhkkaa has joined #opendev | 19:08 | |
*** sgw1 has quit IRC | 19:21 | |
*** sgw1 has joined #opendev | 19:22 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add a grafana/grafyaml image https://review.opendev.org/737397 | 19:26 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 19:26 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 19:31 |
clarkb | fungi: re stale apache worker we could port the worker limits config from mirrors to files? | 19:49 |
clarkb | er I guess its static.opendev.org now not files | 19:49 |
corvus | i wondered about the certcheck email; i guess it's safe to ignore? | 19:49 |
fungi | as mentioned in the meeting just now, running this returns inconsistent results: `echo | openssl s_client -connect zuul-ci.org:https -servername zuul-ci.org 2> /dev/null | openssl x509 -text | grep -i after` | 19:50 |
clarkb | corvus: if the stale worker eventually dies and goes away its fine, the cert has been updated. If it persists we can restart apache forcefully | 19:50 |
clarkb | corvus: on the mirrors we landed mpm config to give apache workers a request count limit | 19:50 |
clarkb | and the static server apache would hit those limits pretty quickly I think | 19:50 |
fungi | yeah, basically we have a month for that worker to recycle before it will start causing intermittent cert failures for people | 19:50 |
clarkb | the mirrors would hit this semi frequently and since we landed that config for the mirrors it hasn't happened again (but it also hasn't been super long so may not have refreshed certs yet) | 19:51 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 19:51 |
fungi | most requests i'm getting a 2020-09-20 expiration, but occasionally i'll get back an older cert expiring 2020-07-22 | 19:52 |
corvus | yeah, i think we should limit the static worker life | 19:53 |
fungi | so yes, manually issuing a service restart would clear this in the short term, but longer term we likely need something like clarkb describes | 19:53 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 19:53 |
fungi | clarkb: looks like this was your change to set it for the mirrors: https://review.opendev.org/727873 | 19:58 |
fungi | i'll propose somethnig similar for static | 19:59 |
clarkb | thanks | 19:59 |
ianw | AJaeger: thanks for pointing out release-wheel-cache ; that actually wasn't running until the last attempt. this was because xenial arm64 wheel builds were borked until we merged https://review.opendev.org/735055 | 20:03 |
corvus | i have to go grocery shopping; biab. | 20:03 |
ianw | oh, something i meant to bring up in the meeting was the changes in system-config to add centos support to base | 20:03 |
fungi | corvus: enjoy your adventure, that's my tomorrow morning | 20:03 |
ianw | i wasn't sure if people wanted to keep system-config focused on opendev actual production, or how much wiggle room there was | 20:03 |
fungi | i think we've previously asserted that system-config isn't being maintained/supported for reconsumption, it's a public entry point into the maintenance of our running systems and services | 20:05 |
fungi | so if we're going to run some centos-based services then i could see adding support for it | 20:05 |
ianw | that is the thing, i don't think opendev has particular plans for that | 20:07 |
fungi | none that i'm aware of, anyway | 20:07 |
fungi | what prompted the change? | 20:07 |
ianw | i think it is wanting to setup infra compatible mirrors | 20:08 |
fungi | what's an infra compatible mirror? | 20:08 |
ianw | well i mean mirrors that look like the mirrors opendev infra sets up, but not in opendev infra | 20:08 |
ianw | in terms of paths/proxies, etc | 20:09 |
fungi | if there's interest in collaborating on standardization of apache configuration and tooling around ci mirror systems, then we should do that outside system-config (it could of course be seeded by code from system-config) and then hopefully eventually consume that somehow | 20:09 |
ianw | perhaps we should move some of the mirror config bits outside of system-config | 20:10 |
ianw | heh, jinx | 20:10 |
fungi | but yeah, not in the system-config repo | 20:10 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Limit connections for static site Apache workers https://review.opendev.org/737619 | 20:15 |
fungi | clarkb: corvus: ^ as discussed | 20:15 |
clarkb | fungi: one msall but important thing on that | 20:24 |
mnaser | is there any places where python-builder is used to build *multiple* projects into an image? | 20:27 |
mnaser | use case: building image for horizon with multiple plugins | 20:28 |
ianw | mnaser: hrm, dib does pull in a bunch of siblings ... | 20:28 |
*** rchurch has quit IRC | 20:28 | |
ianw | mnaser: specifically this is what i'm talking about -> https://opendev.org/zuul/nodepool/src/branch/master/.zuul.yaml#L216 | 20:30 |
*** rchurch has joined #opendev | 20:30 | |
mnaser | ianw: interesting.. so i guess in my use case, id build a horizon image with all the plugins as siblings | 20:30 |
*** sgw1 has quit IRC | 20:31 | |
ianw | mnaser: yeah, i think so. basically every sibling is copied and then gets installed with https://opendev.org/opendev/system-config/src/branch/master/docker/python-builder/scripts/assemble#L81 | 20:32 |
mnaser | right now our image jobs do a poor job of not actually using the zuul checkout.. sadly.. i'm going to clean that up | 20:33 |
ianw | right, yeah that was the exact use case ... i wanted to make sure Depends-On: for openstacksdk, etc. worked when testing the nodepool containers | 20:34 |
mnaser | ianw: my only pain point is my dockerfile's are in a different repo than the code itself | 20:35 |
mnaser | so making docker build locally pretty darn painful | 20:35 |
mnaser | because in the year 2020, dockerfile's inside openstack projects is too much to ask for | 20:35 |
fungi | clarkb: oh, yep, great catch. we may want to add an actual restart handler to cover that case | 20:37 |
ianw | yeah, well i mean docker is broken on fedora for god knows how long with cgroups or whatever it is, i don't want to think about it | 20:37 |
ianw | in 2021 people will reinvent dpkg and talk about how amazing it is we can share libraries :) | 20:38 |
fungi | in 2021 people will reinvent makefiles yet again. like for the 20th time | 20:39 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Limit connections for static site Apache workers https://review.opendev.org/737619 | 20:41 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 20:43 |
*** sgw1 has joined #opendev | 20:47 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 21:04 |
*** shtepanie has joined #opendev | 21:11 | |
clarkb | fungi: ianw https://review.opendev.org/#/c/736389/ is an easy system-config cleanup if you have a moment | 21:20 |
clarkb | related to dns cleanups | 21:20 |
clarkb | ianw: thanks! | 21:22 |
clarkb | mordred: does your change to switch zuul executors to docker images depend on https://review.opendev.org/#/c/735739/ ? | 21:23 |
clarkb | I've re enqueued that change into the gate so we should have it soon but wanted to call out it isn't merged yet | 21:23 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 21:24 |
*** DSpider has quit IRC | 21:28 | |
openstackgerrit | Mohammed Naser proposed openstack/project-config master: Temporarily unretire incorrectly retired projects https://review.opendev.org/737636 | 21:29 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 21:30 |
openstackgerrit | Mohammed Naser proposed openstack/project-config master: Temporarily unretire incorrectly retired projects https://review.opendev.org/737636 | 21:31 |
mordred | clarkb: yes it does - and thanks! | 21:31 |
mordred | clarkb: the executors are in emergency, but definitely let's wait until that lands to start rolling them out | 21:32 |
*** DSpider has joined #opendev | 21:38 | |
*** factor has joined #opendev | 21:42 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Test multiarch release builds https://review.opendev.org/737315 | 21:45 |
openstackgerrit | Merged opendev/system-config master: Limit connections for static site Apache workers https://review.opendev.org/737619 | 21:47 |
openstackgerrit | Merged opendev/system-config master: Remove elasticsearch01 https://review.opendev.org/736389 | 21:47 |
*** hasharAway has quit IRC | 22:11 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add a grafana/grafyaml image https://review.opendev.org/737397 | 22:21 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 22:21 |
openstackgerrit | Mohammed Naser proposed openstack/project-config master: gerrit: change retired.config acls https://review.opendev.org/737649 | 22:23 |
openstackgerrit | Mohammed Naser proposed openstack/project-config master: Move retired.config to external namespace https://review.opendev.org/737652 | 22:35 |
*** tosky has quit IRC | 22:42 | |
*** clarkb has quit IRC | 22:44 | |
openstackgerrit | Mohammed Naser proposed openstack/project-config master: gerrit: change retired.config acls https://review.opendev.org/737649 | 22:48 |
openstackgerrit | Mohammed Naser proposed openstack/project-config master: openstack: move all openstack projects to dedicated acl https://review.opendev.org/737654 | 22:48 |
*** tkajinam has joined #opendev | 22:53 | |
*** _mlavalle_1 has quit IRC | 22:56 | |
*** clarkb has joined #opendev | 22:58 | |
mordred | mnaser: wow. you're cleaning up! | 23:11 |
mnaser | mordred: doing my best! | 23:12 |
mordred | mnaser: those two changes would be _way_ smaller if they were squahsed. :) | 23:19 |
mnaser | mordred: i need to pad mah statz -- but mainly because i wanted to make 2 simple easy to merge and one that might likely require manual intervention on its own | 23:26 |
mordred | mnaser: dude. now you're gonna make me go pad _my_ stats | 23:29 |
* mordred hasn't done a good old fashioned patch-bomb in ages | 23:29 | |
openstackgerrit | Merged openstack/project-config master: Move retired.config to external namespace https://review.opendev.org/737652 | 23:32 |
openstackgerrit | Merged openstack/project-config master: openstack: move all openstack projects to dedicated acl https://review.opendev.org/737654 | 23:32 |
clarkb | my irc connection dropped and I missed the second one | 23:33 |
clarkb | also firefox is not happy trying to open those diffs | 23:33 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] grafana deployment https://review.opendev.org/737406 | 23:36 |
fungi | gertty opened them quite happily, fwiw | 23:37 |
ianw | clarkb: https://199.204.45.223 is a held node with a containerised grafana (self signed) | 23:40 |
clarkb | ianw: I can confirm it is both a self signed cert and a grafana with existing nodepool problems | 23:41 |
clarkb | somewhat reassuring that those issues don't mysteriously disappear with a redeploy | 23:41 |
ianw | clarkb: yeah, login is admin/adminpassword | 23:41 |
ianw | it's definitely something to do with $region ... change that to * in the UI and it finds stuff | 23:41 |
ianw | "This panel is deprecated. Please migrate to the new Stat panel. " | 23:43 |
ianw | working variable definition -> http://paste.openstack.org/show/795123/ | 23:49 |
ianw | not working defintiion -> http://paste.openstack.org/show/795124/ | 23:51 |
clarkb | ianw: is it the definition: thatm akes the difference? | 23:51 |
ianw | possibly, i updated via the UI and that was what it wrote back | 23:52 |
ianw | i guess this is what writes it? https://opendev.org/opendev/grafyaml/src/branch/master/grafana_dashboards/schema/template/query.py#L32 | 23:54 |
clarkb | ianw: I think thats a schema for the yaml input | 23:54 |
clarkb | though maybe its a 1:1 input to output | 23:55 |
clarkb | ya I think that may be the case | 23:56 |
clarkb | ianw: it uses the schema to parse the datasources and dashboards then it passes that input straight into grafana I think | 23:59 |
clarkb | ianw: so we'd need to update the schema to take definition or add a step in there that copies query to definition | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!