*** mattw4 has quit IRC | 00:00 | |
ianw | trystack.org points to 54.39.56.124 ... which then ends up at openstack.org. so i'm guessing infra has no involvement in it any more? | 00:02 |
---|---|---|
*** yamamoto has joined #openstack-infra | 00:05 | |
*** matt_kosut has joined #openstack-infra | 00:07 | |
*** yamamoto has quit IRC | 00:10 | |
fungi | ianw: yeah, we should be able to just drop the old trystack redirects | 00:11 |
*** matt_kosut has quit IRC | 00:12 | |
*** admcleod has joined #openstack-infra | 00:13 | |
*** tosky has quit IRC | 00:21 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] static: final redirects https://review.opendev.org/710160 | 00:28 |
*** jamesmcarthur has joined #openstack-infra | 00:35 | |
*** ociuhandu has joined #openstack-infra | 00:42 | |
*** gyee has quit IRC | 00:44 | |
*** mattw4 has joined #openstack-infra | 00:46 | |
*** ociuhandu has quit IRC | 00:48 | |
*** jamesmcarthur has quit IRC | 00:51 | |
*** jamesmcarthur has joined #openstack-infra | 00:52 | |
*** jamesmcarthur has quit IRC | 00:57 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] static: final redirects https://review.opendev.org/710160 | 01:05 |
*** Lucas_Gray has quit IRC | 01:17 | |
*** ramishra has joined #openstack-infra | 01:18 | |
*** jamesmcarthur has joined #openstack-infra | 01:22 | |
*** jamesmcarthur has quit IRC | 01:23 | |
*** jamesmcarthur has joined #openstack-infra | 01:23 | |
*** ramishra has quit IRC | 01:26 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] static: final redirects https://review.opendev.org/710160 | 01:29 |
*** gagehugo has joined #openstack-infra | 01:36 | |
*** ramishra has joined #openstack-infra | 01:48 | |
*** jamesmcarthur has quit IRC | 01:55 | |
*** yamamoto has joined #openstack-infra | 01:56 | |
*** jamesmcarthur has joined #openstack-infra | 02:00 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] static: final redirects https://review.opendev.org/710160 | 02:20 |
openstackgerrit | Merged opendev/system-config master: static: fix git raw file redirect https://review.opendev.org/710151 | 02:24 |
rm_work | lolol gerrit: https://review.opendev.org/#/c/549297/ "Updated 1 year, 12 months ago" | 02:29 |
rm_work | such maths | 02:29 |
*** jamesmcarthur has quit IRC | 02:31 | |
*** ijw has quit IRC | 02:37 | |
*** ociuhandu has joined #openstack-infra | 02:43 | |
*** jamesmcarthur has joined #openstack-infra | 02:43 | |
*** ociuhandu has quit IRC | 02:48 | |
*** jamesmcarthur has quit IRC | 02:49 | |
*** yamamoto has quit IRC | 02:49 | |
*** smarcet has joined #openstack-infra | 02:49 | |
*** smarcet has left #openstack-infra | 02:52 | |
*** jamesmcarthur has joined #openstack-infra | 02:53 | |
*** rlandy|bbl is now known as rlandy | 02:56 | |
*** diablo_rojo has quit IRC | 02:57 | |
*** xinranwang has joined #openstack-infra | 03:00 | |
*** jamesmcarthur has quit IRC | 03:07 | |
*** nicolasbock has quit IRC | 03:09 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] static: final redirects https://review.opendev.org/710160 | 03:15 |
*** jamesmcarthur has joined #openstack-infra | 03:31 | |
*** rlandy has quit IRC | 03:44 | |
*** rh-jelabarre has quit IRC | 03:46 | |
*** yamamoto has joined #openstack-infra | 03:55 | |
*** jamesmcarthur has quit IRC | 03:58 | |
*** iokiwi has quit IRC | 03:59 | |
*** iokiwi has joined #openstack-infra | 03:59 | |
*** yamamoto has quit IRC | 04:00 | |
*** matt_kosut has joined #openstack-infra | 04:08 | |
*** matt_kosut has quit IRC | 04:12 | |
*** yamamoto has joined #openstack-infra | 04:13 | |
ianw | i'm confident the git redirect was just a single typo and https://review.opendev.org/710151 has applied, so i have switched git.openstack.org back to a CNAME to git. | 04:14 |
ianw | static.opendev.org i mean | 04:14 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] static: final redirects https://review.opendev.org/710160 | 04:34 |
*** udesale has joined #openstack-infra | 04:35 | |
*** ociuhandu has joined #openstack-infra | 04:45 | |
*** imacdonn has quit IRC | 04:47 | |
*** imacdonn has joined #openstack-infra | 04:48 | |
*** ociuhandu has quit IRC | 04:49 | |
*** gagehugo has quit IRC | 05:03 | |
*** gagehugo has joined #openstack-infra | 05:03 | |
*** larainema has joined #openstack-infra | 05:06 | |
*** ykarel|away is now known as ykarel | 05:08 | |
*** mattw4 has quit IRC | 05:24 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: static: implement legacy redirect sites https://review.opendev.org/710160 | 05:26 |
ianw | AJaeger: ^ i think that's going to be it for static.openstack.org. files.openstack.org should be idle now too. thanks for all your help! | 05:27 |
*** rcernin has quit IRC | 05:33 | |
*** rcernin has joined #openstack-infra | 05:33 | |
*** evrardjp has quit IRC | 05:34 | |
*** evrardjp has joined #openstack-infra | 05:35 | |
*** kozhukalov has joined #openstack-infra | 06:07 | |
*** lmiccini has joined #openstack-infra | 06:08 | |
*** matt_kosut has joined #openstack-infra | 06:09 | |
*** matt_kosut has quit IRC | 06:14 | |
*** rcernin has quit IRC | 06:24 | |
*** lbragstad has quit IRC | 06:26 | |
*** Lucas_Gray has joined #openstack-infra | 06:34 | |
*** ociuhandu has joined #openstack-infra | 06:45 | |
*** ccamacho has quit IRC | 06:49 | |
*** ociuhandu has quit IRC | 06:50 | |
AJaeger | ianw: thanks for driving that! | 06:51 |
*** pgaxatte has joined #openstack-infra | 06:57 | |
AJaeger | ianw: commented, many redirects are wrong (not your fault - still we should fix) | 07:00 |
*** pgaxatte has quit IRC | 07:01 | |
*** pgaxatte has joined #openstack-infra | 07:02 | |
*** ociuhandu has joined #openstack-infra | 07:02 | |
*** dangtrinhnt has joined #openstack-infra | 07:06 | |
*** ociuhandu has quit IRC | 07:07 | |
*** xinranwang has quit IRC | 07:19 | |
AJaeger | ianw: let me update 710160 | 07:25 |
*** ralonsoh has joined #openstack-infra | 07:28 | |
*** ykarel is now known as ykarel|lunch | 07:35 | |
openstackgerrit | Andreas Jaeger proposed opendev/system-config master: Update redirects for legacy sides https://review.opendev.org/710195 | 07:40 |
AJaeger | ianw: here's my proposal ^ | 07:40 |
AJaeger | config-core, please review https://review.opendev.org/710072 and https://review.opendev.org/710112 | 07:42 |
*** tesseract has joined #openstack-infra | 07:53 | |
*** slaweq has joined #openstack-infra | 07:53 | |
*** dangtrinhnt has quit IRC | 07:56 | |
*** dangtrinhnt has joined #openstack-infra | 07:57 | |
*** jcapitao_off has joined #openstack-infra | 07:58 | |
*** jcapitao_off is now known as jcapitao | 08:00 | |
*** dangtrinhnt has quit IRC | 08:01 | |
*** tkajinam has quit IRC | 08:02 | |
*** ociuhandu has joined #openstack-infra | 08:06 | |
*** matt_kosut has joined #openstack-infra | 08:10 | |
*** ociuhandu has quit IRC | 08:12 | |
*** raukadah is now known as chandankumar | 08:13 | |
*** matt_kosut has quit IRC | 08:14 | |
*** tosky has joined #openstack-infra | 08:17 | |
*** ccamacho has joined #openstack-infra | 08:18 | |
*** dangtrinhnt has joined #openstack-infra | 08:20 | |
*** dchen has quit IRC | 08:21 | |
*** matt_kosut has joined #openstack-infra | 08:22 | |
*** ociuhandu has joined #openstack-infra | 08:28 | |
*** udesale has quit IRC | 08:31 | |
*** udesale has joined #openstack-infra | 08:32 | |
*** ociuhandu has quit IRC | 08:38 | |
*** amoralej|off is now known as amoralej | 08:43 | |
*** jpena|off is now known as jpena | 08:51 | |
openstackgerrit | Merged openstack/project-config master: Add publish job for stackviz https://review.opendev.org/710072 | 09:00 |
openstackgerrit | Andreas Jaeger proposed opendev/system-config master: Update redirects for legacy sides https://review.opendev.org/710195 | 09:06 |
*** smarcet has joined #openstack-infra | 09:07 | |
*** lucasagomes has joined #openstack-infra | 09:08 | |
*** smarcet has left #openstack-infra | 09:09 | |
*** rpittau|afk is now known as rpittau | 09:09 | |
*** pkopec has joined #openstack-infra | 09:12 | |
*** Lucas_Gray has quit IRC | 09:12 | |
*** ociuhandu has joined #openstack-infra | 09:14 | |
*** Lucas_Gray has joined #openstack-infra | 09:15 | |
*** elod has quit IRC | 09:19 | |
*** apetrich has joined #openstack-infra | 09:21 | |
*** Lucas_Gray has quit IRC | 09:22 | |
*** Lucas_Gray has joined #openstack-infra | 09:24 | |
frickler | AJaeger: how about I trigger the stackviz periodic job now, so we can look at the results and don't have to wait until tomorrow? | 09:28 |
*** ykarel|lunch is now known as ykarel | 09:30 | |
*** pgaxatte has quit IRC | 09:33 | |
*** pgaxatte has joined #openstack-infra | 09:35 | |
*** yamamoto has quit IRC | 09:35 | |
*** rkukura has quit IRC | 09:37 | |
*** gfidente|afk is now known as gfidente | 09:39 | |
*** ociuhandu has quit IRC | 09:41 | |
*** dangtrinhnt has quit IRC | 09:43 | |
*** xek__ has joined #openstack-infra | 09:43 | |
*** dangtrinhnt has joined #openstack-infra | 09:43 | |
*** auristor has quit IRC | 09:45 | |
*** dangtrinhnt has quit IRC | 09:51 | |
frickler | AJaeger: stackviz failed to build, I don't know enough about npm to debug further http://zuul.openstack.org/build/e78ebf7871ae49149fb3cd2dc8b15e7c | 09:53 |
frickler | also "The task includes an option with an undefined variable. The error was: 'short_name' is undefined" in rename-latest.yaml | 09:54 |
*** dmellado has quit IRC | 09:59 | |
AJaeger | frickler: thanks! Will fix! | 10:01 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Fix stackviz periodic job https://review.opendev.org/710212 | 10:03 |
AJaeger | frickler: These worked in stackviz, so my rework here broke it - and that made it easy to fix. ^ | 10:04 |
*** ociuhandu has joined #openstack-infra | 10:12 | |
*** auristor has joined #openstack-infra | 10:13 | |
*** sshnaidm|afk has joined #openstack-infra | 10:17 | |
*** carli has joined #openstack-infra | 10:18 | |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Remove puppet-forge jobs https://review.opendev.org/710112 | 10:22 |
*** elod has joined #openstack-infra | 10:29 | |
*** sshnaidm|afk is now known as sshnaidm | 10:29 | |
frickler | AJaeger: that looks easy enough, I should likely have spotted the first issue in review, too. will trigger another run once the fix merges | 10:30 |
carli | hello, I'm wondering who I can talk to, because I think I may have done a goof with a script of mine and I wonder if I'm responsible for the bad state in which logstash.openstack.org/ is (it's not showing data anymore), and I would like to know who to warn about it and if I can help unbreak it. I'm not even sure this is the right channel but couldn't find anything that seemed more appropriate | 10:35 |
*** ociuhandu has quit IRC | 10:39 | |
frickler | carli: you have found the right location and I can confirm that the dashboard seems broken, but I don't have time to dig further currently, need to wait for some other infra-root | 10:39 |
openstackgerrit | Merged openstack/project-config master: Fix stackviz periodic job https://review.opendev.org/710212 | 10:41 |
carli | frickler:ok, good that i have found the proper place. I think it's my fault because I was using it to get some info and have I think accidentally made requests too often | 10:44 |
*** carli is now known as carli|afk | 10:51 | |
*** roman_g has joined #openstack-infra | 11:00 | |
*** Lucas_Gray has quit IRC | 11:05 | |
*** Lucas_Gray has joined #openstack-infra | 11:09 | |
*** rpittau is now known as rpittau|bbl | 11:12 | |
*** jcapitao is now known as jcapitao_lunch | 11:14 | |
*** kozhukalov has quit IRC | 11:17 | |
*** ociuhandu has joined #openstack-infra | 11:18 | |
*** ociuhandu has quit IRC | 11:18 | |
*** ociuhandu has joined #openstack-infra | 11:19 | |
*** ociuhandu has quit IRC | 11:20 | |
*** smarcet has joined #openstack-infra | 11:21 | |
*** kozhukalov has joined #openstack-infra | 11:21 | |
*** ociuhandu has joined #openstack-infra | 11:21 | |
*** Lucas_Gray has quit IRC | 11:24 | |
*** ociuhandu has quit IRC | 11:26 | |
*** matt_kosut has quit IRC | 11:26 | |
*** ociuhandu has joined #openstack-infra | 11:26 | |
*** Lucas_Gray has joined #openstack-infra | 11:27 | |
*** ociuhandu has quit IRC | 11:29 | |
*** ociuhandu has joined #openstack-infra | 11:30 | |
frickler | AJaeger: your fix still didn't work, this playbook overrides the npm_command https://opendev.org/openstack/project-config/src/branch/master/playbooks/javascript/content.yaml#L4 | 11:30 |
*** kozhukalov has quit IRC | 11:35 | |
*** ociuhandu has quit IRC | 11:35 | |
*** kozhukalov has joined #openstack-infra | 11:35 | |
*** ociuhandu has joined #openstack-infra | 11:36 | |
*** ociuhandu has quit IRC | 11:36 | |
*** ociuhandu has joined #openstack-infra | 11:37 | |
*** Lucas_Gray has quit IRC | 11:37 | |
*** Lucas_Gray has joined #openstack-infra | 11:38 | |
*** ociuhandu has quit IRC | 11:39 | |
*** ociuhandu has joined #openstack-infra | 11:39 | |
*** ociuhandu has quit IRC | 11:40 | |
*** ociuhandu has joined #openstack-infra | 11:42 | |
*** yamamoto has joined #openstack-infra | 11:42 | |
*** kozhukalov has quit IRC | 11:43 | |
*** yamamoto has quit IRC | 11:46 | |
*** ociuhandu has quit IRC | 11:48 | |
*** ociuhandu has joined #openstack-infra | 11:48 | |
AJaeger | frickler: argh - thx | 11:52 |
*** matt_kosut has joined #openstack-infra | 11:55 | |
*** ociuhandu has quit IRC | 11:56 | |
*** ociuhandu has joined #openstack-infra | 11:56 | |
*** ociuhandu has quit IRC | 11:57 | |
*** ociuhandu has joined #openstack-infra | 11:57 | |
*** ociuhandu has quit IRC | 11:58 | |
*** matt_kosut has quit IRC | 11:58 | |
*** matt_kosut has joined #openstack-infra | 11:58 | |
*** iurygregory has joined #openstack-infra | 11:59 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: fix stackviz publishing https://review.opendev.org/710237 | 11:59 |
AJaeger | config-core, hope that's fixes the job, please review ^ | 12:00 |
*** carli|afk is now known as carli | 12:02 | |
*** nicolasbock has joined #openstack-infra | 12:03 | |
*** amoralej is now known as amoralej|lunch | 12:08 | |
*** udesale_ has joined #openstack-infra | 12:19 | |
*** gagehugo has quit IRC | 12:19 | |
*** udesale_ has quit IRC | 12:21 | |
*** udesale_ has joined #openstack-infra | 12:21 | |
*** udesale has quit IRC | 12:22 | |
*** yamamoto has joined #openstack-infra | 12:25 | |
*** iurygregory has quit IRC | 12:32 | |
*** jpena is now known as jpena|lunch | 12:35 | |
*** jamesmcarthur has joined #openstack-infra | 12:36 | |
*** iurygregory has joined #openstack-infra | 12:36 | |
*** ociuhandu has joined #openstack-infra | 12:38 | |
*** kozhukalov has joined #openstack-infra | 12:39 | |
*** ociuhandu has quit IRC | 12:40 | |
*** eharney has quit IRC | 12:45 | |
*** Lucas_Gray has quit IRC | 12:46 | |
*** Lucas_Gray has joined #openstack-infra | 12:47 | |
*** iokiwi has quit IRC | 12:47 | |
*** iokiwi has joined #openstack-infra | 12:48 | |
*** ociuhandu has joined #openstack-infra | 12:48 | |
*** rkukura has joined #openstack-infra | 12:49 | |
*** Goneri has joined #openstack-infra | 12:50 | |
*** rlandy has joined #openstack-infra | 12:56 | |
*** jamesmcarthur has quit IRC | 13:00 | |
*** jamesmcarthur has joined #openstack-infra | 13:00 | |
Tengu | hello there! We need to create a bunch of repositories on opendev.org/openstack - I'm wondering how to do it.... Care to point me to relevant doc, or contact? Thanks ! | 13:03 |
*** nicolasbock has quit IRC | 13:03 | |
*** nicolasbock has joined #openstack-infra | 13:04 | |
frickler | Tengu: https://docs.openstack.org/infra/manual/creators.html | 13:04 |
*** dmellado has joined #openstack-infra | 13:05 | |
*** sshnaidm_ has joined #openstack-infra | 13:05 | |
*** udesale_ has quit IRC | 13:05 | |
frickler | Tengu: feel free to ask us here if something in the docs is unclear or you need further help | 13:05 |
*** udesale_ has joined #openstack-infra | 13:05 | |
Tengu | frickler: thanks! hopefully I'll be able to push things until tomorrow. | 13:05 |
*** jamesmcarthur has quit IRC | 13:06 | |
Tengu | frickler: as a quick intro: we're wanting to split the existing "tripleo-validations" in a subset of different things, with a new name (validations-*). Not sure in what category this enters :/ | 13:08 |
*** sshnaidm has quit IRC | 13:08 | |
*** jamesmcarthur has joined #openstack-infra | 13:10 | |
Tengu | frickler: oh. so I guess I need to point that in a discussion on #tripleo to get some ACK and actions - tripleo "head" will do the needed creation if I understand correcly. | 13:11 |
mordred | Tengu: you can make the patches needed - but yes, the PTL will need to ack that it's ok before we merge them | 13:13 |
*** Lucas_Gray has quit IRC | 13:14 | |
Tengu | mordred: ... stupid question: I didn't see what repository hold that part of the config :/ | 13:14 |
Tengu | aaahh wait - openstack/project-config apparently. | 13:14 |
Tengu | will check that, and prepare things - I'll be off next week, so won't move things too fast right now. Thanks for the inputs! | 13:16 |
mordred | Tengu: yah. the steps starting from Adding the Project to the CI System in that doc are the ones you'll need to follow | 13:16 |
mordred | Tengu: cool! let us know if you need help | 13:16 |
*** Lucas_Gray has joined #openstack-infra | 13:17 | |
*** ociuhandu has quit IRC | 13:17 | |
*** jcapitao_lunch is now known as jcapitao | 13:17 | |
*** ociuhandu has joined #openstack-infra | 13:18 | |
*** amoralej|lunch is now known as amoralej | 13:18 | |
*** nicolasbock has quit IRC | 13:21 | |
*** pkopec has quit IRC | 13:23 | |
*** ociuhandu has quit IRC | 13:23 | |
*** Lucas_Gray has quit IRC | 13:24 | |
*** rpittau|bbl is now known as rpittau | 13:24 | |
*** Lucas_Gray has joined #openstack-infra | 13:25 | |
*** nicolasbock has joined #openstack-infra | 13:26 | |
*** ociuhandu has joined #openstack-infra | 13:26 | |
*** rfolco has joined #openstack-infra | 13:28 | |
*** rfolco has quit IRC | 13:29 | |
*** jamesmcarthur has quit IRC | 13:32 | |
*** jamesmcarthur has joined #openstack-infra | 13:32 | |
*** jpena|lunch is now known as jpena | 13:34 | |
*** yamamoto has quit IRC | 13:35 | |
*** lpetrut has joined #openstack-infra | 13:36 | |
*** rh-jelabarre has joined #openstack-infra | 13:41 | |
*** gshippey has joined #openstack-infra | 13:43 | |
*** iurygregory has quit IRC | 13:46 | |
*** jamesmcarthur has quit IRC | 13:47 | |
*** jamesmcarthur_ has joined #openstack-infra | 13:47 | |
*** matt_kosut has quit IRC | 13:47 | |
*** matt_kosut has joined #openstack-infra | 13:48 | |
*** smarcet has quit IRC | 13:52 | |
*** matt_kosut has quit IRC | 13:52 | |
*** yamamoto has joined #openstack-infra | 13:54 | |
*** eharney has joined #openstack-infra | 13:56 | |
fungi | carli: looks like logstash has recovered now | 13:58 |
fungi | i started looking at logs, but didn't actually do anything | 13:58 |
*** lbragstad has joined #openstack-infra | 13:59 | |
carli | ok, great then | 14:00 |
fungi | the apache logs for the kibana interface were complaining that calls to the elasticsearch api endpoint (on elasticsearch03.openstack.org) were timing out. the elasticsearch logs on 03 were in turn complaining about timeouts getting data from 07 | 14:00 |
fungi | so i don't know for sure, but elasticsearch07 may have been temporarily unhappy. it logged servicing the queries it received from 03 but didn't mention any errors | 14:02 |
* fungi is baffled | 14:02 | |
*** Lucas_Gray has quit IRC | 14:06 | |
*** Lucas_Gray has joined #openstack-infra | 14:08 | |
*** ykarel is now known as ykarel|away | 14:08 | |
mordred | fungi: well, that should keep excess noise down | 14:11 |
fungi | yup, plenty of baffles for everyone | 14:12 |
mordred | fungi: in other news, https://review.opendev.org/#/c/709253/ seems like the type of patch you'd enjoy | 14:14 |
Shrews | for some definition of "enjoy" | 14:14 |
* mordred baffles Shrews | 14:15 | |
AJaeger | config-core, please review https://review.opendev.org/710237 | 14:17 |
AJaeger | smcginnis: could you sent a patch to finish the retiring of the bdd plugin repo, please? | 14:17 |
*** yamamoto has quit IRC | 14:18 | |
Shrews | mordred: 2 questions on that change: 1) why the need for 'become: yes' when it wasn't needed before? 2) Removing snapd seems unnecessary maybe? At least, if someone were to look at that role at a later date, they would question why that's being done. | 14:20 |
*** Lucas_Gray has quit IRC | 14:20 | |
fungi | mordred: also i've left a couple of reminder comments in that change... we may want to think about how we might restrict google's key to be bound only to that package repository, and also how to tell apt to only consider specific packages available for installation from that special repository | 14:23 |
mordred | fungi: good point | 14:24 |
*** Lucas_Gray has joined #openstack-infra | 14:24 | |
mordred | Shrews: good point | 14:25 |
Shrews | I guess for #2 it's more about cleanup than anything on-going, but I can't think of a better way to do cleanup | 14:25 |
fungi | we've been a little lax about that in the past, but the second thing has been possible for years and i some recent-ish improvements in apt are supposed to make the former possible now as well (but i don't recall how new, maybe too new still) | 14:25 |
mordred | fungi: yeah - I think in this case the later is easy and definitely should be done | 14:26 |
mordred | Shrews: yeah - for 2 it's mostly just cleanup - we only had snapd installed so that we could install that package | 14:27 |
mordred | s/package/snap/ | 14:27 |
mordred | so if we're not using the snap anymore, seems good to cleanup after ourselves | 14:27 |
mordred | Shrews: that said - there'sa . follow up patch to remove the removal - so we can run the removal once then land the cleanup and not confuse ourselves in the future | 14:27 |
Shrews | mordred: totes. just seems weird to have that cleanup in there after it's been executed once. i don't have a good solution for that though | 14:27 |
Shrews | mordred: oh, that works | 14:28 |
mordred | me either - it's one of the biggest problems with git-driven ops - what to do with one-off commands. I think the answer is alknasdofnasoernfaoj | 14:28 |
Shrews | ah, of course | 14:28 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Optimize canMerge using graphql https://review.opendev.org/709836 | 14:28 |
Shrews | mordred: also, sorry for missing the cleanup bit in the related review | 14:29 |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Remove legacy-tempest-dsvm-full-bdd job https://review.opendev.org/710063 | 14:29 |
lennyb | Hi, is https://review.opendev.org functions ok, I am getting 'early EOF index-pack failed' error http://paste.openstack.org/show/790075/ | 14:30 |
Shrews | lennyb: you should be cloning from opendev.org (https://opendev.org/openstack/nova.git) | 14:32 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Replace kubectl snap with apt repo https://review.opendev.org/709253 | 14:32 |
mordred | fungi: like that ^^ ? | 14:32 |
lennyb | Shrews, it's part of CI, so devstack should pull review server. Am I wrong? | 14:33 |
*** jamesmcarthur_ has quit IRC | 14:35 | |
Shrews | lennyb: where is that error being produced? | 14:35 |
mordred | lennyb: no- in CI nothing should be cloning - zuul should be providing the needed git repos | 14:36 |
mordred | lennyb: I agree with Shrews' question - where is this from? | 14:36 |
lennyb | Shrews, during running devstack #./stack.sh . I will remove GIT_BASE var in my local.conf. I am running third-party CI | 14:38 |
Shrews | GIT_BASE should be opendev.org | 14:39 |
openstackgerrit | Sean McGinnis proposed openstack/project-config master: Retire devstack-plugin-bdd repo https://review.opendev.org/710280 | 14:40 |
*** jamesmcarthur has joined #openstack-infra | 14:40 | |
lennyb | Shrews, mordred. thanks. | 14:41 |
*** gagehugo has joined #openstack-infra | 14:47 | |
*** jamesmcarthur has quit IRC | 14:48 | |
openstackgerrit | James E. Blair proposed opendev/base-jobs master: Add test docker image build jobs https://review.opendev.org/710283 | 14:55 |
corvus | mordred, fungi: ^ there's a redo from yesterday | 14:56 |
mnaser | just a heads up | 14:56 |
mnaser | https://www.githubstatus.com/ | 14:56 |
*** udesale_ has quit IRC | 14:57 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Stream output from kubectl pods https://review.opendev.org/709261 | 15:02 |
*** Lucas_Gray has quit IRC | 15:05 | |
*** ociuhandu has quit IRC | 15:08 | |
*** Lucas_Gray has joined #openstack-infra | 15:09 | |
*** mattw4 has joined #openstack-infra | 15:10 | |
*** marios|ruck has joined #openstack-infra | 15:11 | |
marios|ruck | o/ folks we are seeing a lot of RETRY_LIMIT in tripleo jobs is this a known issue http://tripleo-cockpit.usersys.redhat.com/d/9DmvErfZz/cockpit?orgId=1&fullscreen&panelId=63 | 15:12 |
marios|ruck | oops sorry bad url we have an upstream one | 15:13 |
marios|ruck | there http://dashboard-ci.tripleo.org/d/cockpit/cockpit?orgId=1&fullscreen&panelId=63 | 15:13 |
*** ociuhandu has joined #openstack-infra | 15:14 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: executor: blacklist dangerous ansible host vars https://review.opendev.org/710287 | 15:15 |
*** mattw4 has quit IRC | 15:17 | |
*** goneri_ has joined #openstack-infra | 15:19 | |
*** gyee has joined #openstack-infra | 15:21 | |
*** goneri_ has quit IRC | 15:21 | |
frickler | marios|ruck: looks like something is broken, but I cannot tell yet where. doesn't seem to affect tripleo only. infra-root: this looks bad http://zuul.openstack.org/builds?result=RETRY_LIMIT | 15:22 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Stream output from kubectl pods https://review.opendev.org/709261 | 15:23 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add destructor for SshAgent https://review.opendev.org/709609 | 15:23 |
frickler | not sure all of it can just be github being broken | 15:23 |
corvus | i'll track one of those down | 15:25 |
corvus | RESULT_UNREACHABLE | 15:26 |
corvus | that happened while a jab playbook was in progress | 15:26 |
corvus | let me see if i can find where the nodes were | 15:26 |
frickler | one job I was watching failed with that on rax-ord | 15:27 |
corvus | rax-iad | 15:28 |
frickler | slightly related: shouldn't retries be capped at 3? seeing "5. attempt" on some patches in kolla gate, e.g. 710067,2 | 15:29 |
corvus | frickler: what job name? | 15:29 |
frickler | corvus: kolla-ansible-centos-source-upgrade | 15:30 |
marios|ruck | thanks frickler | 15:30 |
corvus | frickler: http://zuul.openstack.org/job/kolla-ansible-base has increased the retry attempts to 5 | 15:31 |
corvus | i'm not sure if we're seeing a cloud problem, or a zuul/zk/nodepool problem | 15:31 |
corvus | oh this was a while a go wasn't it? | 15:34 |
* frickler needs to be homeward bound now, will try to check again later | 15:34 | |
frickler | corvus: from the age of the gate queue, might have started about 12h ago, but likely still ongoing | 15:34 |
corvus | it looks like zuul has lost zk connection 6 times today so far | 15:35 |
corvus | possibly due to memory pressure | 15:37 |
*** jamesmcarthur has joined #openstack-infra | 15:38 | |
*** lmiccini has quit IRC | 15:38 | |
corvus | apparently starting on 2-25 at 15:00, which is right around triggered a zuul-scheduler full-reconfigure on zuul01 to troubleshoot lack of job matches on openstack/openstack-ansible-rabbitmq_server changes | 15:39 |
corvus | but it's also when i was poking around in the repl | 15:40 |
corvus | that seems a more likely cause; i may not have cleaned up sufficiently. sorry about that. | 15:40 |
corvus | we should restart the zuul scheduler soon to fix | 15:40 |
fungi | note i also did another full-reconfigure later to get zuul to notice a gerrit branch deletion | 15:40 |
corvus | i've often wondered if issuing "del" commands in the repl is necessary to clean up memory; i'm leaning toward "the answer is yes" after this | 15:41 |
*** chandankumar is now known as raukadah | 15:41 | |
fungi | seems it continues to apply implied branch matchers on a project if it had more than 1 branch but then drops to only 1 through deletion of other branches | 15:41 |
fungi | i've no idea if that should be considered a bug, and/or whether it's worth fixing | 15:41 |
fungi | full-reconfigure got it to stop applying an implied branch matcher though | 15:42 |
corvus | sounds like a bug | 15:43 |
corvus | i'd like to see if we can limp through merging 710287 to zuul so we can restart with that | 15:43 |
fungi | noted, i'm in meetings this morning but can be around to help with a restart | 15:44 |
*** marios|ruck is now known as marios|out | 15:50 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Stream output from kubectl pods https://review.opendev.org/709261 | 15:51 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add destructor for SshAgent https://review.opendev.org/709609 | 15:51 |
*** sshnaidm_ is now known as sshnaidm | 15:58 | |
*** priteau has joined #openstack-infra | 16:00 | |
*** diablo_rojo has joined #openstack-infra | 16:00 | |
*** apevec has joined #openstack-infra | 16:01 | |
tristanC | it seems like lots of job are failing in RETRY_LIMIT without logs, is there a known issue? | 16:02 |
fungi | tristanC: yep, we think it's zk fluttering due to scheduler memory pressure | 16:02 |
*** marios|out has quit IRC | 16:03 | |
fungi | we're preparing to restart it as soon as 710287 (hopefully) lands | 16:03 |
apevec | is that repeating every few days? https://twitter.com/openstackinfra/status/1231497178268545024 | 16:04 |
tristanC | fungi: thanks, i didn't realize my irc client wasn't scrolled all the way down and missed the context :) | 16:04 |
*** jcoufal has joined #openstack-infra | 16:05 | |
clarkb | apevec: separate issues with common symptoms. The one you note was caused by a bug in zuul's git management iirc | 16:06 |
clarkb | corvus: fungi I've got appointment this morning but then will be arpund to help probably 2 hours from now | 16:06 |
*** factor has quit IRC | 16:08 | |
fungi | apevec: is what repeating? the retry_limit issue? not that i've seen | 16:08 |
*** factor has joined #openstack-infra | 16:08 | |
*** pkopec has joined #openstack-infra | 16:08 | |
*** ccamacho has quit IRC | 16:08 | |
fungi | it's just an acute thing which has been happening for the past few hours and will clear as soon as we can restart the scheduler to relieve memory pressure | 16:09 |
fungi | apevec: the cause on sunday was a problem with some redirects, i believe | 16:11 |
*** factor has quit IRC | 16:11 | |
fungi | unrelated, just resulting in a similar symptom | 16:11 |
fungi | (if you count "zuul gave up trying to rerun this job after x times" a symptom) | 16:12 |
*** factor has joined #openstack-infra | 16:13 | |
apevec | fungi, ack, thanks for info | 16:15 |
fungi | corvus: 710287 isn't going to merge without a retry, zuul-tox-remote failed | 16:15 |
*** Lucas_Gray has quit IRC | 16:15 | |
*** lmiccini has joined #openstack-infra | 16:15 | |
*** Lucas_Gray has joined #openstack-infra | 16:16 | |
*** mattw4 has joined #openstack-infra | 16:17 | |
AJaeger | config-core, please review https://review.opendev.org/710237 and https://review.opendev.org/710280 (I can approve once Zuul gets restarted) | 16:17 |
mordred | fungi: I can't find anything about restricting which repos an apt-key can be used for - any thoughts of where I should look? | 16:20 |
*** lmiccini has quit IRC | 16:20 | |
*** ociuhandu_ has joined #openstack-infra | 16:21 | |
*** factor has quit IRC | 16:22 | |
*** factor has joined #openstack-infra | 16:22 | |
*** factor has quit IRC | 16:23 | |
*** ociuhandu has quit IRC | 16:25 | |
fungi | mordred: see signed-by in https://manpages.debian.org/experimental/apt/sources.list.5.en.html | 16:25 |
fungi | i'm checking to see if we have that yet | 16:25 |
*** ociuhandu_ has quit IRC | 16:26 | |
fungi | looks like it's in debian/buster at least | 16:26 |
fungi | yeah, appears in a manpage on an ubuntu/bionic server as well | 16:27 |
*** tesseract has quit IRC | 16:27 | |
fungi | "option to require a repository to pass apt-secure(8) verification with a certain set of keys rather than all trusted keys apt has configured. It is specified as a list of absolute paths to keyring files (have to be accessible and readable for the _apt system user, so ensure everyone has read-permissions on the file) and fingerprints of keys to select from these keyrings" | 16:29 |
mordred | fungi: k. so it looks like what we'd want to do is add signed-by={keyid} to the main apt sources | 16:29 |
mordred | or - path to the debian keyring file | 16:29 |
mordred | and ubuntu keying file | 16:29 |
mordred | s/and/or/ | 16:29 |
mordred | and import the key for google into its own keyring | 16:30 |
fungi | yeah, the risk is in adding google's archive key into the apt trusted keyring rather than corralling it in its own keyring and telling apt to trust it for that one repository | 16:30 |
mordred | maybe just pinning the additional repo to specific package is good enough? it's unlikely that google's key is going to sign things we get from upstream ubuntu anyway? | 16:31 |
fungi | like i said, we haven't been especially careful about this in the past so i'm okay just punting on it for now, but we should keep it in mind | 16:31 |
*** carli has quit IRC | 16:32 | |
mordred | ++ | 16:32 |
fungi | basically the model of "add any old third-party key to apt's trusted set" opens us up to the possibility than someone who has (or gains) control of that key could masquerade a repository of their own as an official distro package repository | 16:32 |
mordred | fungi: I like the idea of starting to add pins for any additional repos we're adding | 16:32 |
mordred | that seems very managable - and helps us to audit and be aware of and document _why_ we're adding a repo - what we expect to install from it | 16:33 |
fungi | i mean, i mostly trust google's key to not be used that way (oh, wait, they stopped promising to "do no evil" right?) but some stuff we install from external package repos may be signed by keys which aren't as carefully guarded | 16:33 |
mordred | yah | 16:34 |
mordred | in system-config in ansible we currently add 4 repos | 16:35 |
mordred | (incliuding that kubectl patch) | 16:35 |
*** lmiccini has joined #openstack-infra | 16:35 | |
*** sreejithp has joined #openstack-infra | 16:35 | |
mordred | one for docker, one for podman and one for openafs | 16:35 |
mordred | I think we could add pins for all of them - and then say that as we continue to transition puppet stuff, when we put in apt_key or apt_repo tasks we should have corresponding pin files | 16:35 |
mordred | should make it managable in general | 16:35 |
fungi | right, it's a pattern i think we should consider switching to as we get time | 16:36 |
mordred | ++ | 16:36 |
* mordred is about to go get some coffee, but can push up some pin patches for our existing repos when he gets back | 16:36 | |
mordred | infra-root: ^^ scrollback with fungi and I worth perusing | 16:36 |
fungi | i mean, ultimately, you're still trusting whoever controls that package you're installing not to include some malicious maintscript which apt is going to run with root privs at install time | 16:37 |
fungi | which is why i don't think it's critical that we do it | 16:37 |
*** kkalina has joined #openstack-infra | 16:37 | |
mordred | yah. I think the other thing that isn't so much _maliciousness_ but that's still a good idea is accidentally grabbing some other package that happens to be in the repo | 16:38 |
mordred | the google repo is a much better example for that than the others, since it's a general purpose repo and might have unrelated packages with newer versions | 16:38 |
clarkb | you would still fail right? or will apt fallback to a package it can validate? | 16:38 |
mordred | with the pinning, you'd be telling apt to use things from the main repos except for a specific set of packages from the additional repo | 16:39 |
mordred | so - get docker from docker.io - but not libc | 16:39 |
mordred | and then if docker decides to ship a docker package that needs an updated by them libc - that would fail, because the deps wouldn't resolve - but that's something we should know about and would want to fail :) | 16:40 |
clarkb | got it | 16:40 |
*** lpetrut has quit IRC | 16:40 | |
corvus | ++ | 16:41 |
mordred | corvus: I'm going to dig in to this when I get back from coffee, but https://review.opendev.org/#/c/704582/ has a failed job that is not showing any logs | 16:42 |
mordred | on https://zuul.opendev.org/t/openstack/build/e0d0273a851040c89f2d5e6050d0b583 | 16:42 |
corvus | mordred: it's the thing i dug into earlier; swapping killing zk sessions | 16:43 |
mordred | ah | 16:43 |
mordred | great - then I won't dig in to it | 16:43 |
corvus | i will restart the scheduler soon | 16:43 |
mordred | neat | 16:43 |
fungi | corvus: not sure if you saw me mention earlier, but 710287 isn't going to merge without a retry, zuul-tox-remote failed | 16:44 |
corvus | fungi: yep, i just rejiggered the queue | 16:44 |
fungi | aha, indeed i just looked back over at the status page | 16:44 |
*** dosaboy has quit IRC | 16:45 | |
*** jpena is now known as jpena|brb | 16:45 | |
*** smarcet has joined #openstack-infra | 16:45 | |
*** jamesmcarthur has quit IRC | 16:45 | |
*** jamesmcarthur has joined #openstack-infra | 16:46 | |
*** electrofelix has joined #openstack-infra | 16:46 | |
*** howell has joined #openstack-infra | 16:48 | |
*** ijw has joined #openstack-infra | 16:49 | |
*** ijw has quit IRC | 16:49 | |
*** ijw has joined #openstack-infra | 16:49 | |
*** ccamacho has joined #openstack-infra | 16:53 | |
*** lucasagomes has quit IRC | 16:58 | |
corvus | fungi, mordred: can you review https://review.opendev.org/710283 when you have a second? i'd like to try the provides/requires shift again today, but this time with test jobs | 17:01 |
fungi | looking | 17:01 |
*** dosaboy has joined #openstack-infra | 17:01 | |
fungi | corvus: adding that role to playbooks/docker-image/pre.yaml is the only thing in there which looks possibly risky | 17:04 |
fungi | i guess the docker-image pre playbook is not heavily used? | 17:05 |
corvus | fungi: it's a new file (doesn't exist now | 17:05 |
fungi | ahh, right, and any job would have to add that playbook explicitly to use it | 17:06 |
*** pgaxatte has quit IRC | 17:14 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: executor: blacklist dangerous ansible host vars https://review.opendev.org/710287 | 17:16 |
smcginnis | Another job failure that looks like the redirect issue from yesterday - https://zuul.opendev.org/t/openstack/build/07930a201c224525b60b248145376096 | 17:17 |
smcginnis | Did that regex fix get applied? | 17:18 |
smcginnis | fungi, ianw: ^ | 17:18 |
fungi | smcginnis: i'll double-check, ianw switched the dns entry back to it last night | 17:19 |
smcginnis | We've had other releases this morning that have been fine. | 17:19 |
fungi | https://git.openstack.org/cgit/openstack/requirements/plain/upper-constraints.txt is redirecting to https://opendev.org/openstack/requirements/raw/branch/master/upper-constraints.txt and returning content for me | 17:19 |
fungi | i wonder if one of our backends isn't returning the right content | 17:21 |
*** njohnston is now known as neutron-fwaas | 17:21 | |
*** neutron-fwaas is now known as njohnston | 17:22 | |
*** rpittau is now known as rpittau|afk | 17:23 | |
*** jpena|brb is now known as jpena | 17:23 | |
*** Lucas_Gray has quit IRC | 17:24 | |
fungi | they're all returning the correct content for me | 17:25 |
fungi | i wonder if pip isn't doing sni and is getting the default vhost | 17:25 |
*** larainema has quit IRC | 17:26 | |
fungi | though i don't see any indication on the old server that we were explicitly setting one as the default | 17:27 |
corvus | i am able to reproduce with curl | 17:28 |
*** Lucas_Gray has joined #openstack-infra | 17:28 | |
AJaeger | smcginnis: the regex fix merged | 17:28 |
smcginnis | It had been working so far today. | 17:28 |
AJaeger | https://review.opendev.org/#/c/710151/1 is the regex fix | 17:29 |
smcginnis | Including unit test coverage. | 17:29 |
corvus | sometimes when i run curl "curl -o - https://git.openstack.org/cgit/openstack/requirements/plain/upper-constraints.txt" i get redirected to The document has moved <a href="https://opendev.org/">here</a>. | 17:30 |
corvus | other times i get The document has moved <a href="https://opendev.org/openstack/requirements/raw/branch/master/upper-constraints.txt">here</a>. | 17:30 |
smcginnis | That seems to be what this job got. | 17:30 |
smcginnis | Or at least, some type of actual HTML page versus a redirect to the raw file. | 17:30 |
corvus | there isn't a load balancer involved, is there? | 17:31 |
AJaeger | I wonder why that job does not have the requirements repo checked out - looks wrong to me... | 17:31 |
* AJaeger fixes | 17:32 | |
smcginnis | That's what I was confused about last night. | 17:32 |
smcginnis | I would have expected it to just use a local copy rather than trying to pull it down. | 17:32 |
fungi | indeed, and yet that redirect is only served from one place | 17:32 |
corvus | one apache worker process is old | 17:32 |
corvus | it may have old config data | 17:33 |
fungi | aha, yep, we've seen that with ssl certs too | 17:33 |
corvus | i'll kick it | 17:33 |
fungi | something holds an open connection to a worker (or the worker doesn't realize a connection has died quietly) and so never terminates | 17:33 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Add requirements repo to publish-tox-docs-releases https://review.opendev.org/710322 | 17:34 |
AJaeger | smcginnis, config-core, this should fix it ^ | 17:34 |
*** evrardjp has quit IRC | 17:35 | |
corvus | smcginnis, fungi, AJaeger: i restarted apache on static01, and my highly scientific test of "run curl a bunch" is now returning a consistent redirect | 17:35 |
*** evrardjp has joined #openstack-infra | 17:35 | |
smcginnis | corvus, AJaeger: Thank you both! | 17:35 |
AJaeger | corvus: great. Let's still merge 710322 to avoid this here... | 17:35 |
smcginnis | That would be much more efficient. | 17:35 |
corvus | yes please | 17:36 |
*** igordc has joined #openstack-infra | 17:39 | |
smarcet | fungi: afternoon i got a weird error here https://review.opendev.org/#/c/710128/ | 17:40 |
*** priteau has quit IRC | 17:40 | |
smarcet | fungi: No such file or directory: 'docker': 'docker'", | 17:41 |
smarcet | fungi: should i recheck | 17:41 |
smarcet | ? | 17:41 |
fungi | smarcet: yes, we temporarily broke the opendev-buildset-registry job on accident yesterday | 17:42 |
fungi | it should be working now | 17:42 |
smarcet | fungi: ack thx u | 17:42 |
*** lmiccini has quit IRC | 17:45 | |
fungi | corvus: so to revisit your earlier question, i guess technically the answer is yes, there is a load balancer involved. apache is balancing load across multiple worker processes | 17:47 |
fungi | i wonder if setting MaxRequestsPerChild to a nonzero value would solve that | 17:49 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Fix GCE volume parameters https://review.opendev.org/710324 | 17:49 |
corvus | fungi: i thought it had some (large) default | 17:50 |
fungi | hrm, it defaults to 10000 | 17:50 |
fungi | yeah | 17:50 |
fungi | maybe that worker hadn't handled 10k requests since the config updated, or maybe that's just a suggestion but the worker will continue to be used for more requests if some connection is holding on | 17:51 |
corvus | yeah, i don't know which, and i didn't hit the status url, so lost the data | 17:51 |
fungi | that same server is hosting docs.openstack.org now, so i'd be surprised if all its workers hadn't gotten 10k requests since the fix merged | 17:52 |
clarkb | my time estimate was good. At my desk now | 17:54 |
clarkb | fungi: corvus we saw similar stale workers with LE cert rotation on one of the mirrors | 17:54 |
clarkb | a global lifespan limit would probably be a reasonable idea | 17:54 |
corvus | ++ | 17:55 |
fungi | clarkb: yeah, i had been contemplating that as a workaround when we hit it with the stale cert | 17:57 |
fungi | as you suggested earlier though, another possible workaround is to switch back to using full restart instead of graceful restart, and just accept that we'll terminate running connections and possibly have a fraction of a second where connections are refused | 17:58 |
clarkb | is the zuul memory issue something that still needs attention? | 17:59 |
corvus | clarkb: yes, but i'm still hoping thet 710287 lands soon | 18:00 |
clarkb | ok, let me know if I can help | 18:00 |
clarkb | catching up on email and scrollback otherwise | 18:00 |
corvus | yeah, i think we're waiting ether for that to land or for the system to go over the cliff | 18:01 |
*** igordc has quit IRC | 18:01 | |
*** jcapitao is now known as jcapitao_off | 18:02 | |
*** Lucas_Gray has quit IRC | 18:02 | |
*** gfidente is now known as gfidente|afk | 18:04 | |
*** igordc has joined #openstack-infra | 18:04 | |
*** mattw4 has quit IRC | 18:05 | |
*** mattw4 has joined #openstack-infra | 18:05 | |
AJaeger | 710287 has another 30mins, let's keep fingers crossed ;) | 18:06 |
clarkb | I've WIP'd https://review.opendev.org/#/c/703488/6 because its depends-on has merged but then I guess was reverted? | 18:09 |
AJaeger | clarkb: yeah, was reverted - mnaser thought it needed different process | 18:10 |
*** Lucas_Gray has joined #openstack-infra | 18:10 | |
AJaeger | config-core, please review https://review.opendev.org/710237 and https://review.opendev.org/710280 | 18:11 |
clarkb | as confirmation /etc/apache2/mods-available/mpm_event.conf sets MaxConnectionsPerChild 0 | 18:14 |
fungi | oh! | 18:14 |
clarkb | and mpm event is what we have enabled | 18:14 |
fungi | i should have grepped, i was looking in conf-enabled | 18:14 |
*** amoralej is now known as amoralej|off | 18:14 | |
fungi | note however that none of those conffiles are linked in mods-enabled | 18:15 |
fungi | oh, wait, they are | 18:15 |
fungi | grep doesn't follow symlinks | 18:15 |
clarkb | fungi: ya I had to ls for which mpm was enabled then follow that a bit manually to the config | 18:15 |
fungi | `grep -i MaxConnectionsPerChild /etc/apache2/mods-enabled/*` does indeed turn up many hits | 18:16 |
fungi | all setting to 0, which per the docs means indefinite/never expire | 18:16 |
fungi | okay, so behavior explained there, i suppose | 18:16 |
clarkb | mod-enabled conf files are loaded well before conf.d conf files so we should be able to drop a file in conf.d/ and override that value | 18:16 |
fungi | agreed, that's how we've tuned similar values on etherpad | 18:17 |
fungi | with a /etc/apache2/conf-enabled/connection-tuning.conf file | 18:17 |
fungi | though we set MaxRequestsPerChild 0 in etherpad's connection-tuning.conf too | 18:17 |
fungi | (just as a point of reference) | 18:18 |
clarkb | maybe set that value to 8096? | 18:18 |
*** rishabhhpe has joined #openstack-infra | 18:19 | |
clarkb | (I don't know what a good balance between forking too much and not actually flushing workers is) | 18:19 |
rishabhhpe | Hi All, after triggering a zuul job on nodepool spawned instance on which i installed devstack .. i am not able to run any openstack command . it is saying Failed to discover auth URL | 18:19 |
fungi | i suppose "reloading configuration won't take effect on workers held by long-running/defunct clients" is a good reason to reconsider that value | 18:19 |
clarkb | I was hoping there was a time based rotation option but I don't see one | 18:19 |
fungi | clarkb: luckily apache has a recommendation for that. docs say default is 10k | 18:19 |
fungi | seems like as good a number as any | 18:20 |
clarkb | works for me | 18:20 |
frickler | clarkb: seems the revert of the revert now has the opendev change as dep, so instead of -W we should likely proceed with with 703488 instead? see https://review.opendev.org/710020 | 18:20 |
fungi | rishabhhpe: odds are you're lacking a clouds.yaml configured for your devstack deployment | 18:20 |
clarkb | fungi: hrm I don't think we should land 703488 until we make the openstack change | 18:21 |
clarkb | maybe that was part of mnaser's concern? | 18:21 |
clarkb | er frickler ^ sorry bad tab complete | 18:21 |
fungi | clarkb: i think part of mnaser's concern was that we *hadn't* yet merged our proposed governance | 18:22 |
clarkb | fungi: right we can't do that until we remove the openstack conflict | 18:22 |
clarkb | I'm not sure what it would mean to have a non openstack governed project in openstack | 18:22 |
fungi | and he was reticent to see those repos removed from openstack until we have a clear governance confirmed | 18:22 |
clarkb | but to me that doesn't make sense | 18:22 |
clarkb | do we land a change that says "this is only in effect if openstack change lands?" | 18:23 |
mnaser | I think moving forward with a governance is probably good even if we have a gap in time where the reflection of reality isn’t accurate | 18:23 |
fungi | clarkb: ttx has suggested a possible workaround is for the tc to approve a resolution about it, but i'm not quite clear on what that would entail | 18:23 |
clarkb | mnaser: I think we can move forward with it, but I don't think we can land it as being in effect | 18:24 |
mnaser | Yeah, I guess having more infra members voting on it too would be a good step too | 18:24 |
clarkb | because that would create a massive conflict | 18:24 |
fungi | yeah, we can't update our documentation to say it's opendev without it no longer being openstack | 18:24 |
clarkb | (and I'm not sure how we'd resolve issues within the context of that conflict) | 18:24 |
mnaser | At the state when we merged it was 1 infra team vote at least | 18:24 |
fungi | mnaser: i think some of the earlier infra team votes on the openstack governance change got cleared by a new patchset, so not sure if those were also being counted | 18:25 |
fungi | diablo_rojo had requested a minor adjustment to it | 18:25 |
*** jcoufal has quit IRC | 18:26 | |
fungi | i can't remember if i got around to reapplying my vote after that | 18:26 |
clarkb | mnaser: ok would it work to have quorum on https://review.opendev.org/#/c/703488/6 and have that depends on https://review.opendev.org/710020? Then TC can 710020 when sufficient quorum is reached on 703488 allows us to land them close together but with appropriate ordering? | 18:26 |
mnaser | FWIW I think if we considered OpenDev as an OIP within the OSF as step 1 then openstack tc would have a resolution saying that the shared infra bits are now being “passed onto and promoted to an OIP” and then the governance draft could start? | 18:26 |
openstackgerrit | Merged openstack/project-config master: Add requirements repo to publish-tox-docs-releases https://review.opendev.org/710322 | 18:26 |
clarkb | *Then TC can land 710020 | 18:26 |
clarkb | fwiw I've been asking for feedback on this for about 12 weeks now | 18:27 |
clarkb | And I've done my best to respond to all feedback that has been received | 18:27 |
fungi | mnaser: i think the osf would need to be very clear on the fact that opendev is not a project focused on producing and distributing software, and that we need to figure out what confirmation requirements would be under those circumstances | 18:27 |
clarkb | so I was very surprised when finding out there were additional concerns | 18:27 |
fungi | mnaser: so far all open infrastructure projects are writing software, and the confirmation guidelines the board approved make assumptions related to that | 18:27 |
fungi | we'd almost certainly need some of those guidelines ignored or adjusted | 18:28 |
corvus | i'm pretty sure the infra team members are in favor of clarkb's proposal; i think lack of votes on that change are likely simply due to having weighed in very early and missed patchset updates | 18:28 |
corvus | (if my vote was missing on the final rev, that's almost certainly why) | 18:29 |
clarkb | to resolve the conflict I would update that change to have a warning block that says this only takes effect if 710020 lands. THen have a change to remove that once 710020 lands | 18:30 |
clarkb | s/would/could/ I think I prefer having strict ordering via gerrit | 18:30 |
rishabhhpe | fungi: when i installed my master devstack .. i did not configured anything extra for clouds.yaml file | 18:31 |
clarkb | but I don't think we should assert two different concurrent governance models as I expect that will only cause confusion | 18:31 |
*** mattw4 has quit IRC | 18:31 | |
*** mattw4 has joined #openstack-infra | 18:32 | |
clarkb | re OSF subproject. I don't have objections to that either, but I agree with fungi that we may have to do some contortioning of rules to make that work (or otherwise update rules so that this applies) | 18:32 |
*** jamesmcarthur has quit IRC | 18:33 | |
*** jamesmcarthur has joined #openstack-infra | 18:33 | |
*** jpena is now known as jpena|off | 18:33 | |
mnaser | clarkb, corvus, fungi and others: would it make sense to maybe have some sort of call on the bridge we use to maybe iron out some of the discussion | 18:35 |
clarkb | mnaser: perhaps. I think I'd like to understand why the multiple mailing list threads and gerrit reviews were didn't solicit this feedback though | 18:35 |
fungi | maybe... what clarification is missing? i'm not even clear on that ;) | 18:35 |
clarkb | (also I've brought it up in most infra meetings over the last 3 months as well) | 18:36 |
mnaser | I think I totally agree and I’m in favour of OpenDev. I just don’t know where OpenDev sits post-split | 18:36 |
fungi | i guess what i'm unclear on is why it needs to sit | 18:36 |
fungi | it's a community with a (forming) governance structure and defined goals | 18:36 |
mnaser | because for things like being an infrastructure donor | 18:36 |
fungi | what are you looking for specifically? some accountability fallback? | 18:37 |
AJaeger | mnaser: opendev will still be around, do we need to solve everything now - or what explicitely is blocking the change? | 18:37 |
fungi | just trying to better understand what the precise concern is and what the reasonable solutions are which alleviate it | 18:37 |
*** mattw4 has quit IRC | 18:38 | |
* AJaeger expected that the proposal gave us a way forward to solve any open questions... | 18:38 | |
mnaser | I would say when people came on and provided infrastructure to the infra team, they came in looking to provide resources to OpenStach (the project) | 18:38 |
mnaser | Now they’re providing it to OpenDev (and whatever projects we currently provide), its a bit of a grey line already now but splitting it out completely.. | 18:39 |
*** mattw4 has joined #openstack-infra | 18:39 | |
fungi | and people active in openstack worked out ways to leverage those resources to provide benefit to openstack, and those people who are involved in openstack are also being approached with resource donations to help other projects | 18:39 |
fungi | but also those resources are ephemeral in nature, and there's no binding legal contract i'm aware of requiring people who donated resources previously to continue to do so if they don't wish to | 18:40 |
fungi | plus, clarkb reached out to all the donors to make them aware of the plan we've been formulating for this and to invite them to provide feedback on it | 18:41 |
fungi | granted it wasn't as solidified at the time, but at least some of them seem to have been following along | 18:42 |
clarkb | Granted that was back in the march/aprilish timeframe last year so things may have changed | 18:42 |
clarkb | I guess, my concern right now is I've practically begged people for feedback for 3 months | 18:42 |
clarkb | and I've done my best to address the feedback I have received | 18:42 |
clarkb | and I'm worried we'll be unable to satisfy concerns in general if ^ didn't work | 18:43 |
mnaser | I agree that it sucks that the follow up hasn’t been right from my side with my concerns and I haven’t seen much movement on that review either | 18:43 |
fungi | i think most of the team provided feedback early in the process, and the proposal's been held open for the sake of anyone who didn't find time to review it yet | 18:45 |
fungi | a lot of the feedback occurred while it was still being drafted, before it hit gerrit | 18:45 |
clarkb | It would be helpful for me to hear what exactly peopke think the next steps are if not what I've proposed and worked at for 3 months | 18:45 |
fungi | at ptgs, in irc and on the mailing list | 18:45 |
mnaser | yeah. I just want to kinda be upfront in that I’m so not against OpenDev, I just don’t want to rush through it so I felt like only 4 votes probably wasn’t enough. | 18:46 |
fungi | and what landed in the gerrit change attempted to take all of that earlier feedback and discussion into account | 18:46 |
fungi | mnaser: 4 votes on which change? the openstack/governance change or the opendev/system-config change? | 18:47 |
mnaser | Governance | 18:47 |
* fungi is getting confused with indirect references and multiple changes | 18:47 | |
fungi | thanks | 18:47 |
fungi | and by 4 votes you mean 4 tc rollcall votes or 4 infra core reviewer votes? | 18:48 |
mnaser | I understand your frustration clarkb and I really hate to be the one who brought up the revert :/ — but this is why I’m trying to personally take ownership and help drive it forward through the tc | 18:48 |
clarkb | mnaser: right, but now I don't know what I can do to move this forward | 18:48 |
clarkb | because everything I have done hasn't solicited the necessary input | 18:48 |
mnaser | I think I’m counting the roll calls only off the top of my head. I’m on mobile though so it’s memory based | 18:48 |
clarkb | corvus: fungi it looks like that zuul change is hitting the reset problems too | 18:49 |
clarkb | corvus: fungi should we consider landing it more forcefully or perhaps after a restart (and plan for a second followup restart) | 18:50 |
clarkb | I don't know if that chagne has made it through cehck testing? I'm assuming not? | 18:50 |
fungi | i'm in favor of directly submitting it in gerrit and making sure it's checked out and installed on the scheduler before we restart | 18:50 |
clarkb | (so we don't have a full set of test data) | 18:51 |
fungi | but maybe after it reports and we get to double-check the builds | 18:51 |
mnaser | clarkb: I will draft up an update to change with a very small resolution, update the mailing list and personally chase reviews myself and I hope that gets us bigger visibility overall. | 18:53 |
clarkb | mnaser: thanks. I'm happy to help, but some direction would be useful | 18:54 |
mnaser | clarkb: cool, sorry for hitting the back button on your progrès but I understand the frustration | 18:56 |
fungi | status notice Memory pressure on zuul.opendev.org is causing connection timeouts resulting in POST_FAILURE and RETRY_LIMIT results for some jobs since around 06:00 UTC today; we will be restarting the scheduler shortly to relieve the problem, and will follow up with another notice once running changes are reenqueued. | 19:03 |
fungi | infra-root: ^ does that look reasonable to send? | 19:03 |
clarkb | fungi: yes, though maybe lets decide on the plan for the zuul bugfix in case we need to restart twice? | 19:04 |
clarkb | fwiw I think I'm willing to risk force merging that change, if it fails we can restart on HEAD~1 | 19:04 |
clarkb | then sort out the problem with a happier zuul | 19:04 |
fungi | well, i wasn't exactly going to say how many times we're restarting, so we can do twice if we want | 19:05 |
clarkb | fair enough. | 19:05 |
clarkb | mordred: looks like we have an review.o*.org cert now \o/ | 19:05 |
clarkb | mordred: has any work been done yet to consume that? If not I can probably give that a go | 19:06 |
mordred | clarkb: woot! | 19:06 |
mordred | and no - I mean - other than the ansible work | 19:06 |
mordred | clarkb: we could consider planning the ansible rollout - to my knowledge we're not really waiting on anything else so could probably get 2.13 in container in a month's time | 19:07 |
frickler | fungi: +1 | 19:07 |
*** rlandy is now known as rlandy|brb | 19:07 | |
clarkb | mordred: we're happy with where review-dev is then with webservers and all that? | 19:08 |
*** ralonsoh has quit IRC | 19:08 | |
mordred | clarkb: yeah - I think so - it's got the redirects working with the certs | 19:09 |
fungi | #status notice Memory pressure on zuul.opendev.org is causing connection timeouts resulting in POST_FAILURE and RETRY_LIMIT results for some jobs since around 06:00 UTC today; we will be restarting the scheduler shortly to relieve the problem, and will follow up with another notice once running changes are reenqueued. | 19:09 |
openstackstatus | fungi: sending notice | 19:09 |
mordred | clarkb: we might want to think about how manage-projects is working | 19:09 |
roman_g | Hello team. What could be the way to see real reasons behind NODE_FAILURE and RETRY_LIMIT errors for jobs? https://zuul.opendev.org/t/openstack/builds?job_name=airship-airshipctl-gate-test | 19:10 |
mordred | clarkb: actually - let me take a look at that for a few minutes - but I think we might be better served by banging that out instead of puppeting the LE stuff (since we're close anyway) | 19:10 |
clarkb | roman_g: today we've had zookeeper connection problems with zuul leading to the retry limits. Your node failures last I checked were due to the new cloud being unable to boot the requested resources | 19:11 |
-openstackstatus- NOTICE: Memory pressure on zuul.opendev.org is causing connection timeouts resulting in POST_FAILURE and RETRY_LIMIT results for some jobs since around 06:00 UTC today; we will be restarting the scheduler shortly to relieve the problem, and will follow up with another notice once running changes are reenqueued. | 19:11 | |
clarkb | roman_g: that was what I sent email about to you and robert and jan-erik | 19:11 |
fungi | though i suppose it's also possible for intermittent zookeeper connection flaps to result in NODE_FAILURE results too, right? | 19:12 |
clarkb | roman_g: I can check logs really quickly to see if that issue persists with the NODE_FAILURES | 19:12 |
roman_g | clarkb: thank you. When did you send it? | 19:12 |
*** jamesmcarthur has quit IRC | 19:12 | |
clarkb | roman_g: february 10, 2020 | 19:12 |
clarkb | fungi: yes I think that is possible if the zk connection dies when processing the node request | 19:12 |
roman_g | clarkb: thanks. Reaching back to Robetr then. | 19:12 |
clarkb | (but not 100% sure of that) | 19:13 |
clarkb | roman_g: let me double check logs really quickly just to rule out the other existing issue | 19:13 |
prometheanfire | does a scheduler restart mean we will need to recheck? | 19:13 |
fungi | prometheanfire: if the builds have already reported in a change then yes, i plan to mention in that in the next notice following the restart | 19:13 |
fungi | if the change is still enqueued at the time of the restart, we'll be reenqueuing it | 19:13 |
corvus | fungi, clarkb: is someone restarting zuul or should i? | 19:14 |
prometheanfire | kk | 19:14 |
prometheanfire | I thought it might be a common question | 19:14 |
fungi | corvus: we were just discussing whether we should submit that change in gerrit first and then make sure it's checked out and installed on zuul.o.o | 19:14 |
fungi | do you have any input? | 19:15 |
corvus | fungi: nah, let's restart zuul on master then let that merge and later restart executors | 19:15 |
clarkb | roman_g: looking at logs I see at least one successful node request for the larger nodes so robert and jan-erik may have solved the problem | 19:15 |
corvus | fungi: it has not passed a full test suite yet | 19:15 |
clarkb | roman_g: might be best to wait for us to solve the zookeeper connection issue (should be fixed shortly) then rerun and see what we end up with | 19:15 |
fungi | okay, i can get to work on the scheduler restart now in that case | 19:15 |
corvus | fungi: ok, all yours, thanks | 19:15 |
clarkb | corvus: fungi ++ let me know if I can help | 19:15 |
*** jcapitao_off has quit IRC | 19:15 | |
mordred | ++ | 19:16 |
*** rishabhhpe has quit IRC | 19:16 | |
roman_g | clarkb: thank you. | 19:16 |
fungi | using `python /opt/zuul/tools/zuul-changes.py http://zuul.opendev.org >queue.sh` as root on zuul.o.o to take a snapshot of the current pipelines | 19:16 |
fungi | stopping the scheduler daemon now | 19:17 |
fungi | debug log says it's "Stopping Gerrit Connection/Watchers" | 19:18 |
fungi | how long is that expected to usually take? | 19:18 |
clarkb | fungi: I think no more than a few minutes | 19:18 |
fungi | okay, it's only been about 1.5 minutes so far. i'll give it a couple more | 19:18 |
*** rlandy|brb is now known as rlandy | 19:19 | |
fungi | we're now at the 3 minute mark since it logged that | 19:20 |
clarkb | is the process still running? | 19:20 |
fungi | yes | 19:20 |
fungi | oh, nope | 19:20 |
fungi | it was | 19:20 |
clarkb | ya I think you are good to start it again | 19:20 |
fungi | but i guess that's the last thing it logged | 19:20 |
*** electrofelix has quit IRC | 19:20 | |
fungi | yeah starting the service now | 19:20 |
clarkb | then pause for configs to load before reenqueing jobs | 19:20 |
corvus | may need to restart zuul-web after it returns | 19:20 |
fungi | 2020-02-27 19:24:11,004 INFO zuul.Scheduler: Full reconfiguration complete (duration: 195.238 seconds) | 19:26 |
fungi | i guess i can start enqueuing things now | 19:26 |
clarkb | ++ | 19:26 |
fungi | i've restarted, even stopped and started, zuul-web but it's still complaining about the api being unavailable | 19:27 |
fungi | hrm, it's not running | 19:28 |
fungi | stale pidfile i think | 19:28 |
fungi | there it goes | 19:28 |
clarkb | ya new process straces with activity | 19:29 |
clarkb | and web ui loads for me | 19:29 |
fungi | `bash -x queue.sh` is running as root now | 19:29 |
fungi | status notice The scheduler for zuul.opendev.org has been restarted; any changes which were in queues at the time of the restart have been reenqueued automatically, but any changes whose jobs failed with a RETRY_LIMIT, POST_FAILURE or NODE_FAILURE build result in the past 14 hours should be manually rechecked for fresh results | 19:29 |
fungi | how does that ^ look for once the reenqueuing is done? | 19:30 |
corvus | ++ | 19:30 |
AJaeger | LGTM | 19:30 |
smcginnis | +1 | 19:30 |
clarkb | fungi: lgtm | 19:31 |
*** ahosam has joined #openstack-infra | 19:35 | |
openstackgerrit | Merged opendev/base-jobs master: Add test docker image build jobs https://review.opendev.org/710283 | 19:36 |
*** jamesmcarthur has joined #openstack-infra | 19:40 | |
fungi | reenqueuing is completed, sending notice now | 19:43 |
fungi | #status notice The scheduler for zuul.opendev.org has been restarted; any changes which were in queues at the time of the restart have been reenqueued automatically, but any changes whose jobs failed with a RETRY_LIMIT, POST_FAILURE or NODE_FAILURE build result in the past 14 hours should be manually rechecked for fresh results | 19:43 |
openstackstatus | fungi: sending notice | 19:43 |
*** eharney has quit IRC | 19:43 | |
-openstackstatus- NOTICE: The scheduler for zuul.opendev.org has been restarted; any changes which were in queues at the time of the restart have been reenqueued automatically, but any changes whose jobs failed with a RETRY_LIMIT, POST_FAILURE or NODE_FAILURE build result in the past 14 hours should be manually rechecked for fresh results | 19:44 | |
openstackstatus | fungi: finished sending notice | 19:46 |
clarkb | the zuul change has all of the jobs that can start running now | 19:47 |
clarkb | I guess we are stable now? http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all looks happy | 19:48 |
* clarkb finds lunch | 19:48 | |
*** ociuhandu has joined #openstack-infra | 19:48 | |
clarkb | roman_g: ^ if you are still around can you give your airship jobs another attempt? | 19:49 |
*** gyee has quit IRC | 19:49 | |
clarkb | roman_g: I think we'll get cleaner data on those now and then we can cross check logs if still failing | 19:49 |
*** gyee has joined #openstack-infra | 19:49 | |
*** Lucas_Gray has quit IRC | 19:52 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Replace kubectl snap with apt repo https://review.opendev.org/709253 | 19:58 |
openstackgerrit | Merged zuul/nodepool master: Fix GCE volume parameters https://review.opendev.org/710324 | 20:02 |
*** nicolasbock has quit IRC | 20:07 | |
*** ijw has quit IRC | 20:23 | |
*** kozhukalov has quit IRC | 20:27 | |
openstackgerrit | Merged opendev/system-config master: OpenStackId v3.0.4 Deployment https://review.opendev.org/710128 | 20:28 |
*** kozhukalov has joined #openstack-infra | 20:28 | |
frickler | AJaeger: dirk: DNS for opensuse.de/com seems borked, is this something you can influence or have contacts for? http://paste.openstack.org/show/790085/ | 20:38 |
ianw | fungi / clarkb: if you wouldn't mind looking in on https://review.opendev.org/#/c/710160/ it should be ready; implements the redirects in apache | 20:47 |
fungi | i'm headed out for an early dinner but will take a look once i get back | 20:48 |
*** ijw has joined #openstack-infra | 20:48 | |
ianw | thanks; and double thanks to you and corvus for looking into the redirect issues from scrollback! | 20:48 |
*** Lucas_Gray has joined #openstack-infra | 20:49 | |
cmurphy | frickler: they're looking into it | 20:52 |
*** ijw has quit IRC | 20:53 | |
AJaeger | thanks, cmurphy ! | 20:53 |
*** ijw has joined #openstack-infra | 20:53 | |
*** ociuhandu has quit IRC | 20:53 | |
AJaeger | config-core, please review https://review.opendev.org/710237 and https://review.opendev.org/710280 | 20:53 |
*** ijw has quit IRC | 20:54 | |
*** ijw has joined #openstack-infra | 20:54 | |
*** jamesmcarthur has quit IRC | 20:54 | |
AJaeger | thanks, ianw | 20:57 |
*** kkalina has quit IRC | 21:01 | |
*** kozhukalov has quit IRC | 21:08 | |
openstackgerrit | Merged openstack/project-config master: fix stackviz publishing https://review.opendev.org/710237 | 21:08 |
openstackgerrit | Merged openstack/project-config master: Retire devstack-plugin-bdd repo https://review.opendev.org/710280 | 21:12 |
*** smarcet has quit IRC | 21:14 | |
*** Lucas_Gray has quit IRC | 21:14 | |
*** slaweq has quit IRC | 21:15 | |
*** jcapitao_off has joined #openstack-infra | 21:17 | |
*** Goneri has quit IRC | 21:19 | |
*** Lucas_Gray has joined #openstack-infra | 21:19 | |
*** aarents has quit IRC | 21:23 | |
*** rosmaita has quit IRC | 21:29 | |
*** mattw4 has quit IRC | 21:30 | |
*** mattw4 has joined #openstack-infra | 21:31 | |
*** rosmaita has joined #openstack-infra | 21:31 | |
*** Lucas_Gray has quit IRC | 21:38 | |
*** kozhukalov has joined #openstack-infra | 21:40 | |
*** pkopec has quit IRC | 21:42 | |
*** dpawlik has quit IRC | 21:43 | |
*** rcernin has joined #openstack-infra | 21:44 | |
jrosser | i'm seeing a few NODE_FAILURE like here https://review.opendev.org/#/c/709795/ | 21:48 |
*** xek__ has quit IRC | 21:53 | |
ianw | jrosser: hrm, only centos-7 or random? | 21:53 |
*** Goneri has joined #openstack-infra | 21:53 | |
*** jamesmcarthur has joined #openstack-infra | 21:54 | |
jrosser | ianw: i'm going to hazard a guess at centos7 | 21:54 |
jrosser | this has jobs in progress but a centos-7 one has failed https://review.opendev.org/#/c/708097/ | 21:54 |
ianw | 2020-02-27 18:03:43,246 DEBUG nodepool.driver.NodeRequestHandler[nl01-8524-PoolWorker.rax-ord-main]: Accepting node request 200-0007660291 | 21:57 |
ianw | 2020-02-27 21:56:47,855 INFO nodepool.driver.NodeRequestHandler[nl01-8524-PoolWorker.rax-ord-main]: Node request 200-0007660291 disappeared | 21:57 |
ianw | i've never seen this combo before | 21:57 |
jrosser | found a 3rd centos-7 node failure on this too https://review.opendev.org/#/c/710256/ | 21:58 |
ianw | there is a lot of these messages for rax | 21:58 |
*** smarcet has joined #openstack-infra | 21:59 | |
clarkb | ianw: yup looking | 22:00 |
clarkb | ianw: jrosser that is due to the memory leak | 22:01 |
clarkb | when zuul scheduelr runs out of memory its zk connection dies. That then causes the znodes that require a connection to be valid to go away | 22:01 |
clarkb | zuul was restarted around 2000UTC to address this | 22:01 |
*** smarcet has quit IRC | 22:02 | |
openstackgerrit | Merged zuul/zuul master: executor: blacklist dangerous ansible host vars https://review.opendev.org/710287 | 22:02 |
clarkb | if that issue is persisting we may need to look in further | 22:02 |
jrosser | i think these might all have been rechecked after that | 22:03 |
ianw | clarkb: tailing the logs right now on nl, i'm seeing the disconnect errors | 22:03 |
ianw | nl01 | 22:03 |
ianw | zuul.openstack.org doesn't seem under any particular memory or cpu pressure, however | 22:04 |
clarkb | I wonder if some other issue is precipitating the connections failures post restart (and possible before too) | 22:05 |
clarkb | corvus: fungi ^ fyi | 22:05 |
ianw | the launcher might need a restart too? | 22:06 |
clarkb | ya or maybe zk is unhappy? | 22:06 |
ianw | the last message on zk01 at least is from 2018-11-30 22:42:38,185 | 22:07 |
corvus | the scheduler has not lost a zk connection since the restart | 22:07 |
clarkb | cacti shows healthy looking zk fwiw | 22:08 |
*** jamesmcarthur has quit IRC | 22:08 | |
corvus | ianw: that pair of log lines spans the restart i think | 22:08 |
clarkb | ya was just going to mention that | 22:08 |
clarkb | possible that we didn't time out the node requests quickly for some reason (perhaps zuul scheduler shutdown didn't properly FIN those tcp connections so had to wait for keepalive?) | 22:09 |
ianw | corvus: probably, sorry that was just a random pull. i am seeing a lot of kazoo.exceptions.NoNodeError just tailing right now | 22:09 |
ianw | if, however, they relate to something that started before the restart, not sure | 22:09 |
corvus | restart was at 19:15 | 22:09 |
fungi | okay, pizza consumed, catching up | 22:10 |
corvus | is nodepool not cleaning up disappearing requests? | 22:10 |
corvus | because that request, 200-0007660291, keeps showing up in nl01's logs | 22:11 |
*** slaweq has joined #openstack-infra | 22:11 | |
corvus | Shrews: ^ fyi | 22:11 |
ianw | ok, might be a red herring http://paste.openstack.org/show/790088/ | 22:11 |
ianw | some nodes have thousands of the same thing | 22:11 |
*** jamesmcarthur has joined #openstack-infra | 22:11 | |
corvus | this does seem like it may be a nodepool bug | 22:12 |
corvus | if the request disappeared, the launcher should clean it up | 22:12 |
Shrews | hrm, that's never been an issue before | 22:12 |
*** stevebaker has quit IRC | 22:13 | |
corvus | Shrews: yeah, it seems weird to me; it's not like we haven't had the scheduler bomb out before | 22:13 |
Shrews | corvus: the rax-ord-main thread is paused, so i don't think it can handle cleanup until it unpauses | 22:14 |
*** slaweq has quit IRC | 22:15 | |
Shrews | they seem to be clearing out? | 22:16 |
Shrews | Node request 200-0007660291 disappeared | 22:16 |
corvus | yeah, that happened 2 hours ago and it's still trying to write to it | 22:16 |
corvus | er 3 hours even | 22:16 |
Shrews | oh yeah | 22:17 |
clarkb | ianw: AJaeger can you check my comments on https://review.opendev.org/#/c/710160/7 I think most can be addressed in a followup but the qa one should probably be fixed early | 22:17 |
Shrews | my buffer got weird | 22:17 |
clarkb | (in which case fixing all of them is probably smart) | 22:17 |
ianw | clarkb: ajaeger did do a follow up -> https://review.opendev.org/#/c/710195/2 | 22:18 |
*** slaweq has joined #openstack-infra | 22:18 | |
corvus | Shrews: oh, it's the storeNode call that's failing | 22:18 |
clarkb | ianw: oh cool | 22:18 |
corvus | Shrews: so this is happening because not only has the node request disappeared, but so has at least one of the underlying nodes we already assigned to it | 22:18 |
clarkb | they'll land together then so less concern about caching bad redirects. | 22:18 |
clarkb | ianw: should I approve the parent change or would you prefer to? | 22:19 |
corvus | Shrews: let's move to #zuul | 22:19 |
ianw | clarkb: i guess qa.openstack.org is already broken anyway | 22:19 |
*** ijw has quit IRC | 22:19 | |
ianw | clarkb: please approve; i've made all the _acme-challange CNAMEs so it should deploy. i can update dns when that's active | 22:19 |
gmann | ianw: clarkb there is no qa.o.o yet. so leaving that as it is or redirecting to QA wiki is fine. | 22:20 |
clarkb | ianw: done | 22:20 |
ianw | gmann: i just copied whatever was currently being done for this transition :) ajaeger has a change up that modified it @ https://review.opendev.org/#/c/710195/2/playbooks/roles/static/files/50-qa.openstack.org.conf | 22:22 |
ianw | i'm sure he won't mind if you want to edit that to point to, whatever | 22:22 |
*** slaweq has quit IRC | 22:22 | |
clarkb | corvus: ianw: sounds like restarting the launcher is the expected short term fix | 22:25 |
clarkb | should I do that or is someone planning to already? | 22:25 |
gmann | ianw: ok. we thought of building qa doc site sometime back but never got time to do. we can update it once we build. | 22:25 |
corvus | clarkb: i'm not, go for it | 22:26 |
clarkb | ok I'll restart all 4 | 22:27 |
fungi | gmann: or once you build a qa doc site you can just use whatever the proper url is for it. would be really nice to be able to retire some of these old vanity domains | 22:27 |
clarkb | nodepool==3.11.1.dev34 # git sha 5d37a0a is what nl01 has been restarted on | 22:28 |
fungi | i tried at one point and then some folks freaked out over a few of them (they used to be hosted on the wiki server, of all places) | 22:28 |
clarkb | I'm going to watch it for a few minutes before doing the other 3 | 22:28 |
fungi | thanks clarkb! | 22:28 |
*** eharney has joined #openstack-infra | 22:29 | |
clarkb | we are at quota in a couple regions so there is some noise from that but not seeing any hard errors yet | 22:30 |
gmann | fungi: yeah, qa.o.o was never the thing. it is fine to just kill it. | 22:32 |
fungi | ianw: ^ | 22:32 |
clarkb | I'm following node 0014870455 which has been building for 3 monites | 22:32 |
*** Goneri has quit IRC | 22:32 | |
clarkb | if that one goes ready/in-use I'll see that as the all clear to restart the other three | 22:32 |
clarkb | and it just went ready | 22:33 |
clarkb | proceeding with nl02-nl04 now | 22:33 |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: Fix for clearing assigned nodes that have vanished https://review.opendev.org/710343 | 22:33 |
clarkb | #status log Restarted nodepool launcher on nl01-nl04 to clear out state related to a deleted znode. Launchers now running nodepool==3.11.1.dev34 # git sha 5d37a0a | 22:35 |
openstackstatus | clarkb: finished logging | 22:35 |
fungi | is a monites more like a month or more like a minute? | 22:38 |
fungi | ahh, from context i'm guessing minute | 22:38 |
ianw | fungi: filed 38883 to deal with that later | 22:39 |
ianw | https://storyboard.openstack.org/#!/story/2006598 | 22:39 |
fungi | thanks, i'm a ways down an openstack vmt tunnel at the momenty | 22:42 |
clarkb | fungi: minute | 22:42 |
fungi | yeah, i worked it out eventually, sorry for the noise | 22:43 |
ianw | everything that files.openstack.org is serving is now served by static.opendev.org. i'm going to switch the files.openstack.org CNAME to static.opendev.org and complete the transition. the server will still run as files02.openstack.org for now | 22:43 |
ianw | hang on, let me double check the serveralias for files.openstack.org is working on static.opendev.org | 22:44 |
*** ijw has joined #openstack-infra | 22:47 | |
ianw | hrm the static.opendev.org cert doesn't seem to cover files.openstack.org for some reason :/ | 22:47 |
ianw | it picked up the change ... [Wed Feb 26 02:21:15 UTC 2020] Multi domain='DNS:static.opendev.org,DNS:static01.opendev.org,DNS:files.openstack.org,DNS:static.openstack.org | 22:49 |
*** jamesmcarthur has quit IRC | 22:49 | |
fungi | is there a cap on the number of domain names in a le cert? | 22:49 |
ianw | i think it's like 100 | 22:50 |
*** sreejithp has quit IRC | 22:50 | |
clarkb | and the encourage more names per cert (as it reduces their api overhead) | 22:50 |
*** tkajinam has joined #openstack-infra | 22:51 | |
*** tkajinam has quit IRC | 22:51 | |
*** tkajinam has joined #openstack-infra | 22:51 | |
*** ijw has quit IRC | 22:52 | |
*** ijw has joined #openstack-infra | 22:52 | |
*** ahosam has quit IRC | 22:54 | |
mordred | ianw: did the same thing happen as with review-dev - we already had an old one and didn't get a new one? | 22:55 |
ianw | mordred: yeah, i'm starting to think this is a bug in acme.sh and it's manual dns update mode | 22:56 |
mordred | ianw: if a cert is on disk and the only thing that changes is the addition of a host to the altname list - it erroneously does not get a new cert | 22:56 |
mordred | that would be the description of the bug, yeah? | 22:56 |
ianw | i think it runs, gets the TXT records to commit and updates it's db/config file like it got the new domains | 22:57 |
ianw | when really the validation step hasn't been done | 22:57 |
mordred | yeah | 22:57 |
*** owalsh has quit IRC | 22:58 | |
ianw | we could write out a .stamp file where we force a renewal when we know an update has happened | 22:58 |
*** mattw4 has quit IRC | 23:02 | |
tristanC | dear openstack-infra folks, would it be possible to have fedora-31 mirror in afs? if so, what would be the place to propose such addition? | 23:03 |
mordred | ianw: yay makefile tricks! | 23:04 |
clarkb | tristanC: it is possible, but as I'ev mentioned to others I think we need to start with f31 images | 23:04 |
clarkb | tristanC: and for that to happen we need to figure out how to manage dnf's new rpm package format (new builders or image element that runs in container fs context or something) ianw likely has better thoughts on that than me though | 23:05 |
*** mattw4 has joined #openstack-infra | 23:06 | |
ianw | for now i just want to get a nb deployed with containerised and that will "fix" it, for now, as the tools will be recent enough | 23:06 |
mordred | ianw: are we blocked on that for anything (other than time)? | 23:07 |
mordred | ianw: and would it be useful for me to help? | 23:07 |
ianw | mordred: just me doing it. i *think* i've sorted out all the issues; that's what i spent a bunch of time getting the nodepool container jobs working | 23:07 |
mordred | ok. cool | 23:08 |
ianw | so in theory, it's just dropping it on a host | 23:08 |
ianw | ... in theory ... :) | 23:08 |
mordred | I'm happy to page stuff in and help out if you need | 23:08 |
mordred | ianw: but good to know we should be fixed again by our new container overlords | 23:09 |
*** slaweq has joined #openstack-infra | 23:11 | |
ianw | https://github.com/acmesh-official/acme.sh/issues/2763 filed a bug on acme.sh | 23:12 |
*** ociuhandu has joined #openstack-infra | 23:12 | |
ianw | i'd prefer not to run a fork | 23:12 |
mordred | I agree | 23:12 |
*** slaweq has quit IRC | 23:16 | |
*** owalsh has joined #openstack-infra | 23:16 | |
clarkb | ianw: looks like you can pass a --force | 23:18 |
ianw | clarkb: yeah, the problem is known when to pass it :) | 23:18 |
clarkb | ya | 23:18 |
clarkb | can we infer that from the txt record var state? | 23:18 |
*** ociuhandu_ has joined #openstack-infra | 23:19 | |
clarkb | basically if this cert has a txt record in ansible then we also need to --force it | 23:19 |
ianw | yeah, i think we can do something like that | 23:20 |
ianw | i'm trying to think why i didn't already ... maybe i just missed it | 23:20 |
*** ociuhandu_ has quit IRC | 23:21 | |
clarkb | pretty sure we don't have txt records when steady state not needing to renew (based on log data) | 23:21 |
clarkb | but I think that is one risk, we could renew every hour and then not be able to renew anymore due to rate limiting | 23:21 |
ianw | when: acme_txt_required | length > 0 | 23:21 |
*** aarents has joined #openstack-infra | 23:21 | |
*** ociuhandu has quit IRC | 23:22 | |
ianw | i think that's what i intended to happen | 23:22 |
*** stevebaker has joined #openstack-infra | 23:24 | |
*** jamesmcarthur has joined #openstack-infra | 23:25 | |
clarkb | ianw: corvus if you have a moment can you check my comment on https://review.opendev.org/#/c/709236/3 specifically the note about simply publishing the stats data to zuul swift logs to start | 23:27 |
clarkb | do you think that is worthy of a new patchset ? | 23:27 |
ianw | clarkb: i don't think you have to. with the periodic job working now, i think we have many options | 23:33 |
ianw | for example, we can not install anything on the static server, but copy the latest logs to a nodepool node, and run the analysis tool on it from there? | 23:33 |
ianw | that way we never run the chance of screwing something up on the static server | 23:34 |
clarkb | ianw: that is an interesting idea. I think in general not moving the data is a good thing but maybe we can do a `ssh static cat *.log | goaccess -` sort of thing | 23:34 |
clarkb | that way the logs never end up on disk until they've been sanitized | 23:34 |
fungi | from a pii safety perspective, the fewer copies of those files reside on additional systems, the better | 23:34 |
*** kozhukalov has quit IRC | 23:35 | |
ianw | yeah, a network stream sounds good. also less overhead, if goaccess pins the cpu for a few minutes * a few sites periodically, that could be annoying | 23:37 |
openstackgerrit | Merged zuul/nodepool master: Fix for clearing assigned nodes that have vanished https://review.opendev.org/710343 | 23:38 |
ianw | although it looks like it's designed to not do that | 23:38 |
clarkb | infra-root ^ I'll try to remember to restart launchers again once puppet updates them with that change | 23:38 |
clarkb | ianw: you can pass it a stdin stream iirc | 23:39 |
clarkb | they just don't document it well /me looks again | 23:39 |
clarkb | ianw: see https://goaccess.io/man#examples | 23:40 |
ianw | yeah, i was looking at the incremental thing with the db ... in theory you could take the db from the last run from a zuul artifact and feed it into the next run? | 23:42 |
ianw | ... oh, although maybe the db isn't sanitised, so can't really publish it | 23:42 |
clarkb | ianw: ya I haven't checked the db. I think to start we can just not use the db | 23:43 |
ianw | ... we could encrypt it with the system-config key. then decrypt it in the job | 23:43 |
ianw | so many options :) | 23:43 |
clarkb | ianw: we'll get a new 30 day window (or whatever our rotation is) each periodic run | 23:43 |
clarkb | that isn't a regression for the 404 accounting case (we only ever looked at what was present and didn't tally over time beyond that) | 23:43 |
openstackgerrit | Ghanshyam Mann proposed openstack/hacking master: DNM: testing nova fix https://review.opendev.org/710349 | 23:48 |
*** jamesdenton has quit IRC | 23:49 | |
*** jamesdenton has joined #openstack-infra | 23:50 | |
*** dchen has joined #openstack-infra | 23:53 | |
*** dychen has joined #openstack-infra | 23:55 | |
*** dychen has quit IRC | 23:56 | |
*** stevebaker has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!