opendevreview | Merged openstack/project-config master: Add Ceph Dashboard charm to OpenStack charms https://review.opendev.org/c/openstack/project-config/+/797911 | 00:37 |
---|---|---|
opendevreview | Merged openstack/project-config master: Add members of the neutron drivers team as ops in neutron channel https://review.opendev.org/c/openstack/project-config/+/796521 | 00:43 |
opendevreview | Merged openstack/project-config master: Adding missing zuul for etcd3gw https://review.opendev.org/c/openstack/project-config/+/795824 | 00:43 |
opendevreview | Merged openstack/project-config master: Add project templates to rbd-iscsi-client https://review.opendev.org/c/openstack/project-config/+/798348 | 00:43 |
*** ysandeep|away is now known as ysandeep | 01:19 | |
*** diablo_rojo is now known as Guest857 | 01:24 | |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 01:57 |
*** ysandeep is now known as ysandeep|afk | 02:00 | |
opendevreview | Merged openstack/project-config master: Remove scientific-wg from infra https://review.opendev.org/c/openstack/project-config/+/797747 | 02:11 |
opendevreview | Merged openstack/project-config master: Add noop job to remove content of puppet-openstack-specs repo https://review.opendev.org/c/openstack/project-config/+/798392 | 02:19 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 02:28 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 02:48 |
opendevreview | Merged openstack/project-config master: End project gate for puppet-openstack-specs https://review.opendev.org/c/openstack/project-config/+/798393 | 03:01 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 03:12 |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Add mariadb requirements https://review.opendev.org/c/opendev/lodgeit/+/798411 | 04:27 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 04:30 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 04:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 04:38 |
opendevreview | Ian Wienand proposed openstack/project-config master: Add opendev/lodgeit to openstack https://review.opendev.org/c/openstack/project-config/+/798413 | 04:50 |
*** marios is now known as marios|ruck | 05:12 | |
*** ysandeep|afk is now known as ysandeep | 05:51 | |
opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/798418 | 06:10 |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Allow for overriding title https://review.opendev.org/c/opendev/lodgeit/+/798419 | 06:15 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 06:23 |
*** jpena|off is now known as jpena | 07:00 | |
opendevreview | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/798418 | 07:17 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 07:31 |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Allow for overriding title https://review.opendev.org/c/opendev/lodgeit/+/798419 | 07:36 |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Use bionic for Python 3.6 https://review.opendev.org/c/opendev/lodgeit/+/798428 | 07:40 |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Add py3.8 support https://review.opendev.org/c/opendev/lodgeit/+/773479 | 07:42 |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Add py3.8 support https://review.opendev.org/c/opendev/lodgeit/+/773479 | 07:43 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 08:04 |
opendevreview | Merged opendev/lodgeit master: Use bionic for Python 3.6 https://review.opendev.org/c/opendev/lodgeit/+/798428 | 08:12 |
opendevreview | Merged opendev/lodgeit master: Add py3.8 support https://review.opendev.org/c/opendev/lodgeit/+/773479 | 08:13 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: Add a meta log upload role with a failover mechanism https://review.opendev.org/c/zuul/zuul-jobs/+/795336 | 08:15 |
*** ykarel is now known as ykarel|lunch | 08:38 | |
opendevreview | Dmitriy Rabotyagov proposed opendev/lodgeit master: Add mariadb requirements https://review.opendev.org/c/opendev/lodgeit/+/798411 | 08:39 |
opendevreview | Rico Lin proposed openstack/project-config master: Retired js-openstack-lib https://review.opendev.org/c/openstack/project-config/+/798529 | 08:40 |
opendevreview | Dmitriy Rabotyagov proposed opendev/lodgeit master: Redesign manage.py to not use deprecated werkzeug.script https://review.opendev.org/c/opendev/lodgeit/+/693378 | 08:40 |
opendevreview | Dmitriy Rabotyagov proposed opendev/lodgeit master: Allow for overriding title https://review.opendev.org/c/opendev/lodgeit/+/798419 | 09:08 |
*** bhagyashris_ is now known as bhagyashris | 09:15 | |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Retry on initial table creation failure https://review.opendev.org/c/opendev/lodgeit/+/798620 | 09:21 |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Allow for overriding title https://review.opendev.org/c/opendev/lodgeit/+/798419 | 09:26 |
opendevreview | Ian Wienand proposed opendev/lodgeit master: Retry on initial table creation failure https://review.opendev.org/c/opendev/lodgeit/+/798620 | 09:26 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 09:39 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: Add a meta log upload role with a failover mechanism https://review.opendev.org/c/zuul/zuul-jobs/+/795336 | 09:46 |
*** ykarel|lunch is now known as ykarel | 10:32 | |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 10:53 |
dtantsur | Hi folks! I'm pondering adding a few words about release notes after https://docs.opendev.org/opendev/infra-manual/latest/developers.html#starting-a-change. WDYT? | 11:00 |
*** jpena is now known as jpena|lunch | 11:34 | |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 11:53 |
frickler | dtantsur: I think mentioning the option of including both release notes and documentation updates might be helpful, just be aware that not every project might have them, that document is generic for opendev, not specific for openstack | 12:23 |
dtantsur | frickler: yeah, I was thinking about something like "Some projects have release note associated with each user-visible change. If you see a releastenotes directory in the source tree, you can add a release note with::" | 12:31 |
frickler | dtantsur: sounds good to me, likely worth it's own section then, like the one about signed-off? | 12:33 |
dtantsur | yep | 12:33 |
* dtantsur looks for the source code of the guide | 12:34 | |
dtantsur | aha, https://opendev.org/opendev/infra-manual, right? | 12:34 |
*** jpena|lunch is now known as jpena | 12:36 | |
opendevreview | Dmitry Tantsur proposed opendev/infra-manual master: Developers: mention release notes and documentation https://review.opendev.org/c/opendev/infra-manual/+/798655 | 12:46 |
dtantsur | okay, something like this ^^^ | 12:46 |
opendevreview | Dmitry Tantsur proposed ttygroup/gertty master: examples: matching storyboard stories https://review.opendev.org/c/ttygroup/gertty/+/798678 | 13:04 |
fungi | dtantsur: i'm on the fence about whether reno guidance should be in the openstack contributor guide instead, and there's already https://docs.openstack.org/project-team-guide/release-management.html#managing-release-notes | 13:04 |
dtantsur | fungi: "Release Management" is definitely not what newcomers read | 13:05 |
dtantsur | we direct them to https://docs.opendev.org/opendev/infra-manual/latest/developers.html, is the the wrong place? | 13:05 |
fungi | dtantsur: agreed, i feel like that section is misplaced | 13:05 |
*** amoralej|off is now known as amoralej | 13:05 | |
dtantsur | note that not only openstack projects have release notes. zuul seems to as well. | 13:05 |
fungi | dtantsur: i thought we directed openstack contributors to https://docs.openstack.org/contributors/code-and-documentation/index.html | 13:05 |
fungi | it references the opendev infra manual in places to avoid being too redundant | 13:06 |
dtantsur | yeah, it's pretty much just links to the infra guide. "Push your change" is very short :) | 13:06 |
dtantsur | my goal is to make newcomers aware of release notes before I have to leave a -1 or do it for them | 13:07 |
fungi | well, there's a section in the contributor guide on documentation it might fit in | 13:07 |
dtantsur | I'm no longer a newcomer, but I suspect people won't immediately check the documentation section | 13:08 |
dtantsur | just a gut feeling, of course.. but I suspect people don't go very far beyond the first page | 13:08 |
fungi | er, to rephrase, it might fit in the documentation section of the of the contributor guide, i meant | 13:08 |
dtantsur | you mean https://docs.openstack.org/contributors/code-and-documentation/documentation.html? | 13:08 |
fungi | yeah | 13:09 |
dtantsur | what I like about https://docs.opendev.org/opendev/infra-manual/latest/developers.html is that it's pretty much a one-stop shop for contributing a small change | 13:09 |
fungi | dtantsur: yeah, it's not an easy choice. i just worry that the opendev contributor guide is already too large, and we should try to remove more references to project-specific guidance rather than increasing it | 13:09 |
dtantsur | (I just had to direct a person somewhere, and I've chosen this page) | 13:09 |
dtantsur | then we probably need to create a similar page in the openstack guide | 13:10 |
dtantsur | because if we just link to the infra guide, people will just follow the infra guide | 13:10 |
fungi | the goal of the opendev infrastructure guide is to be about infrastructure and tools opendev operates and supplies, while reno is an openstack tool (albeit one used by a number of non-openstack projects too) | 13:10 |
fungi | i'm not strongly opposed, just not sure quite where to draw the line and strike a balance between these different sorts of guides | 13:11 |
fungi | we've already enlisted the contributor guide maintainers to help forklift more openstackish bits out of the infra manual so we can simplify it | 13:12 |
dtantsur | I see your concern. I'm not sure either. | 13:12 |
mordred | we need a new name for a type of thing here ... | 13:14 |
dtantsur | naming is easy | 13:14 |
* dtantsur ducks | 13:14 | |
mordred | because there are a set of thigns that openstack uses - but that other things also use, and that collectively form a nice and consistent way of working | 13:15 |
mordred | but that are also not opendev | 13:15 |
mordred | so being able to say "here's how you work on a project that has chosen to use this set of tools" without needing to determine whether you are or are not openstack could be helpful? | 13:16 |
mordred | dtantsur: just don't let ME name the thing | 13:16 |
dtantsur | hehe | 13:16 |
dtantsur | What you suggest is pretty much what I've tried to do within the opendev guide | 13:16 |
dtantsur | "If you see a releasenotes directory, create a release note" | 13:16 |
mordred | pbr, bindep, reno come to mind as parts of the in-tree version of this | 13:16 |
mordred | ++ | 13:16 |
mordred | makes sense | 13:17 |
dtantsur | but fungi is concerned about making this guide too specific | 13:17 |
mordred | yah - it's a fair point | 13:17 |
fungi | there's also the argument that if a project is relying on bindep, or reno, or pbr... they should put relevant information about that in their own project-specific contributor guide. those are things not specific to being hosted in opendev, they're tools any project anywhere could use, and there are plenty more... if you see a yarn.lock file in a project you may need to update that when you're | 13:20 |
fungi | making changes, but i'm not sure we would document that in the infra manual | 13:20 |
dtantsur | maybe in the end we should just polish our (ironic's) contributor guide.. | 13:20 |
fungi | that a project uses bindep, or pbr, or reno is fundamentally no different from the fact that it uses yarn | 13:21 |
mordred | totally - but one could argue that there are set of tools that while not strictly endemic to opendev projects, are commonly found here and are not commonly found elsewhere | 13:21 |
mordred | (I agree with the point though - I'm still just having morning tea and chatting) | 13:21 |
dtantsur | tea-and-chatting++ | 13:21 |
mordred | but largely I think I fall more on the "these aren't opendev specific" - but I do think it would handy to be able to name a type of project that uses those things - because there is an underlying philosophy to all of those that might be useful also to document somewhere | 13:23 |
mordred | sort of like how gating.dev should document the idea of gating and the background idea | 13:23 |
fungi | with my openstack hat on, i feel like the contributor guide and project CONTRIBUTING.rst files could stand to link to https://docs.openstack.org/project-team-guide/release-management.html#how-to-add-new-release-notes | 13:25 |
fungi | for non-openstack projects who are looking for a more generic url to include in their project-specific contributor guidance, linking to https://docs.openstack.org/reno/latest/user/usage.html would make sense | 13:25 |
mordred | reno, pbr and bindep are all pieces of implementing some philosophical stances about a way to manage a project. "all the info should be in git and should work, things that are in git should not also be in text in files, in-project tooling needs to be able to handle concurrent non-sequenced merges" | 13:26 |
fungi | also human-first, self-documenting data which as a side effect happens to be machine parseable so it can be leveraged by automated systems like project gating ci | 13:27 |
fungi | you're right, we do have a lot of opinions about how software should be developed ;) | 13:28 |
fungi | anyway, back to the original topic, i'm not -1/-2 on 798655, i'm just also not +1/+2 on it without a bit more debate | 13:30 |
mordred | fungi: ++ to human-first, self-documenting ... | 13:32 |
* dtantsur is very sad that `git review -d` requires authentication.. | 13:38 | |
fungi | dtantsur: it could be made not to, but will require using https | 13:39 |
dtantsur | this is fine, I just don't want to enroll my testing VMs into gerrit | 13:39 |
dtantsur | do you know the flag (?) to use https? | 13:40 |
dtantsur | ah, we probably have ssh hardcoded in .gitreview.. | 13:40 |
fungi | dtantsur: when i said it could be made to, i mean i think it would need a patch to not rely on auth while using https, but it's worth a try | 13:41 |
fungi | also ssh won't be hard-coded in .gitreview, it's just the default and there are two or three ways you can override it locally | 13:42 |
dtantsur | ack, thanks | 13:42 |
fungi | i'm checking the docs for specifics now | 13:43 |
fungi | ~/.config/git-review/git-review.conf or /etc/git-review/git-review.conf | 13:43 |
fungi | the manpage says you can set gitreview.scheme to https (default is ssh) | 13:45 |
dtantsur | I see. Yeah, makes sense. | 13:46 |
fungi | anyway, if you try that and it still demands login info, then it may be worth improving git-review to bypass its auth steps for specific operations we know can be performed anonymously | 13:47 |
tosky | fungi: if this is all about setting the 'gerrit' remote, couldn't the pull remote be https, while keeping the push one as ssh? | 13:47 |
fungi | tosky: if git-review were patched to do that, yes. specifically this is about fetching review refs, which aren't necessarily going to be replicated to your origin remote though | 13:48 |
fungi | but yeah you could probably have the gerrit remote use https for pull and ssh for push | 13:49 |
fungi | also some gerrits will require authentication even for operations ours allows to be anonymous, so probably git-review will need a switch to be able to set it to always auth so that it can continue to support those sites | 13:50 |
fungi | dtantsur: tosky: it does appear that the fetch_review() function may work anonymously over https, just skimming the source code | 13:54 |
fungi | also run_http_exc() seems to try anonymous first and then fall back to applying credentials if it gets a 401 | 13:56 |
fungi | so, yeah, this looks like it at least wants to work the way we hope | 13:56 |
fungi | so if it turns out not to, that's a bug worth fixing | 13:58 |
fungi | and the rest api urls it assembles don't seem to assume the expressly authenticated endpoint (a/) | 14:02 |
dtantsur | yeah, but it defaults to ssh nonetheless | 14:04 |
mordred | it might be worth considering making https the default and ssh the fallback (although getting the https auth info the first time is a bit lame) ... the upstream gerrit folks are really pushing using https for everything and most of them recommend against running the ssh server at all. having git-review default to https would be in keeping with upstream direction - although we'd want to think long and hard about how that would or would not | 14:07 |
mordred | impact our community | 14:07 |
fungi | yes, most users of our gerrit have not generated http passwords, and many probably don't have mechanisms set up to securely store them in ways git/git-review can access | 14:23 |
fungi | git-review will use the git credential store if you've configured it, but i'll wager most users eyes gloss over when you ask if they've set it up | 14:24 |
mordred | yeah | 14:36 |
*** ysandeep is now known as ysandeep|dinner | 14:56 | |
clarkb | fungi: mordred dtantsur the opendev docs should be focused on getting code into the system and interacting with zuul etc. Not be too concerned with "best practices" that are likely to vary between projects | 15:04 |
clarkb | dtantsur: re git review -d gerrit gives you the fetch and checkout command in the web ui if you want an easy copy paste | 15:06 |
clarkb | I use that when not in an authenticated context and use git review when I am | 15:06 |
dtantsur | clarkb: yep, that's what I've used so far, but I was curious to try something potentially faster | 15:06 |
clarkb | mordred: I would be a strong -2 against making https default | 15:06 |
clarkb | mordred: https credentials are significantly more painful to manage both as a user and an operator of gerrit | 15:07 |
clarkb | we give users the ability to flip that default in the tool config if they choose but as far as a default at top level that is probably not a great choice | 15:07 |
clarkb | its so bad that gerrit gerrit literally has you do the equivalent of curl | bash to configure it | 15:08 |
clarkb | which is not great | 15:08 |
*** ykarel is now known as ykarel|away | 15:28 | |
mordred | yeah - it's horrible. I'm just saying - no-ssh is a thing upstream keeps pushing | 15:39 |
*** ysandeep|dinner is now known as ysandeep | 15:41 | |
*** ysandeep is now known as ysandeep|away | 15:44 | |
corvus | mordred: google is certainly pushing it, since they can't run it... but i feel like other users are pretty happy with the ssh stuff and aren't in a rush to deprecate it. having said that, non-ssh could be advantageous for some openstack contributor companies that have issues with ssh | 15:49 |
*** amoralej is now known as amoralej|off | 15:57 | |
*** jpena is now known as jpena|off | 15:59 | |
clarkb | infra-root https://etherpad.opendev.org/p/fM064G15GPb-wwJx5mLw how does that look for reaching out to corvus' EMS contact? | 16:00 |
clarkb | I need to catch up on some morning thinsg now, but can send that out this morning if it looks good to y'all | 16:03 |
clarkb | corvus: ^ in particular may want to check it | 16:03 |
*** dviroel is now known as dviroel|lunch | 16:03 | |
*** diablo_rojo__ is now known as diablo_rojo | 16:04 | |
corvus | clarkb: lgtm -- mordred anything else you can think of? | 16:05 |
fungi | clarkb: i think it's good | 16:12 |
fungi | thanks! | 16:12 |
corvus | clarkb: did we talk about mjolnir? | 16:14 |
*** marios|ruck is now known as marios|out | 16:17 | |
fungi | thor's hammer? | 16:18 |
fungi | that's a heavy subject | 16:18 |
clarkb | corvus: it was briefly mentioned as the moderation tool? | 16:19 |
fungi | aha, so the reference is to a large banhammer | 16:20 |
corvus | clarkb: yeah, we may want one of the slightly higher tier plans so that it can run server-side and federate with other systems. that's probably more important for projects > zuul; for zuul only i don't think it's a big deal. | 16:29 |
corvus | clarkb: just wanted to make sure you had background on that; i couldn't remember if we discussed it | 16:30 |
corvus | though looking at the pricing chart, i'm not sure about what tier would be required. anyway -- something to keep in mind. | 16:34 |
corvus | (nickel and silver both say "shared moderation bot (soon)" so i dunno | 16:35 |
clarkb | corvus: ya I figured we could be a little vague and let them fill us in | 16:43 |
clarkb | re "federate with other systems" is that not in the lower tiers either? | 16:43 |
clarkb | I'll go ahead and reach out to EMS now and see what they say | 16:45 |
clarkb | I expect they will be more than happy to give us details | 16:45 |
corvus | clarkb: federating chat is universal -- mjolnir is a federated moderation system | 16:47 |
clarkb | got it | 16:47 |
corvus | if you run it server side, then you can subscribe to ban lists from other communities | 16:47 |
corvus | so it's like what we wanted to do with limnoria, but across a much larger aria | 16:48 |
corvus | * so it's like what we wanted to do with limnoria, but across a much larger area | 16:48 |
*** dviroel|lunch is now known as dviroel | 17:08 | |
clarkb | EMS contact has responded. They are in the UK and would like to schedule a call sometime this week to talk. We'll sort that out tomorrow since their day has already largely ended | 17:17 |
fungi | sounds great | 17:31 |
melwitt | clarkb, fungi: thank you for the reviews on update_blueprint! here's the matching system-config change if we're ready to re-enable it https://review.opendev.org/c/opendev/system-config/+/795914 | 18:03 |
mordred | clarkb: sorry for the delay - the etherpad looks good - which isn't useful now :) | 19:07 |
ianw | https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html#option_mysqldump_network-timeout : Enable large tables to be dumped by setting --max-allowed-packet to its maximum value and network read and write timeouts to a large value. This option is enabled by default. To disable it, use --skip-network-timeout. | 19:41 |
fungi | ahh, still i wonder why it's just affecting the nearer backup target | 19:42 |
ianw | hrm, we run a db container there right? | 19:44 |
clarkb | ianw: correct the db is on localhost in a container | 19:44 |
ianw | this might be something similar to low timeout values i saw yesterday when bringing up paste in ansible | 19:44 |
ianw | (which i wanted to do for precisely the reason that it seemed like an easy way to get some mariadb container production experience) | 19:45 |
ianw | "By default, the server closes the connection after eight hours if nothing has happened. You can change the time limit by setting the wait_timeout variable when you start mysqld. See Section 5.1.8, “Server System Variables”. " | 19:45 |
ianw | i think the containers might have that much lower, like 5 minutes | 19:46 |
ianw | i think there's some query we can run to check | 19:46 |
clarkb | aha | 19:46 |
opendevreview | Merged opendev/system-config master: Re-enable update_blueprint for patchset-created https://review.opendev.org/c/opendev/system-config/+/795914 | 19:54 |
ianw | does https://104.130.239.208/show/807029/ load for anyone else? | 19:57 |
clarkb | it doesn't seem to be returning for me | 19:58 |
fungi | my browser doesn't like the cert, but then hangs loading once i accept it | 19:58 |
clarkb | yup same | 19:58 |
clarkb | the root doesn't load either | 19:59 |
fungi | there is also a longstanding issue i've attempted multiple times to address through timeout changes in the trove config, looking into the timeout/retry settings sqla implements for its sockets by default, et cetera | 19:59 |
fungi | it's supposed to, when it receives a disconnect error, immediately send a sql ping to force reconnection, but it acts like that's not working | 20:00 |
fungi | possible something about the way the connection is dropping makes it not detect it's dead, i'm not sure | 20:00 |
ianw | strace: Process 23714 attached | 20:01 |
ianw | recvfrom(6, ^Cstrace: Process 23714 detached | 20:01 |
ianw | uwsgi 23714 nobody 6u IPv4 342454 0t0 TCP localhost:59382->localhost:mysql (ESTABLISHED) | 20:01 |
ianw | it looks to me that it's stuck on the mysql socket | 20:01 |
ianw | 2021-06-29 17:37:24 3 [Warning] Aborted connection 3 to db: 'lodgeit' user: 'lodgeit' host: '127.0.0.1' (Got timeout reading communication packets) | 20:02 |
ianw | this maps with your ping theory | 20:02 |
fungi | the production one does similarly, after some time sitting unused the next request takes forever and never competes, but then some minutes later everything is reesatblished | 20:02 |
ianw | https://docs.sqlalchemy.org/en/14/core/pooling.html#disconnect-handling-pessimistic | 20:02 |
fungi | it's probably less apparent in production because the server is hit more continuously | 20:03 |
ianw | note that we do not set pool_pre_ping in the code | 20:03 |
ianw | i was looking at that yesterday | 20:03 |
ianw | | wait_timeout | 28800 | | 20:07 |
ianw | that's on gitea mariadb container | 20:07 |
ianw | so that looks like our 8 hours | 20:07 |
fungi | so if the server doesn't get hit for ~8 hours, we should see the issue recur on the subsequent requests | 20:08 |
ianw | sorry, i'm switching back and forth, that's the timeout on the gitea mariadb container | 20:09 |
ianw | i.e. i can't see that particular timeout being the reason the backup connection gets dropped | 20:10 |
ianw | You can obtain more information about lost connections by starting mysqld with the log_error_verbosity system variable set to 3. This logs some of the disconnection messages in the hostname.err file. See Section 5.4.2, “The Error Log”. | 20:11 |
ianw | might be an option | 20:11 |
ianw | the container has been up for 4 days, so it's not like it got restarted on us | 20:12 |
ianw | un 29 05:54:54 gitea01 docker-mariadb[704]: 2021-06-29 5:54:54 128099 [Warning] Aborted connection 128099 to db: 'gitea' user: 'root' host: 'localhost' (Got an error writing communication packets) | 20:16 |
ianw | net_read_timeout and net_write_timeout seem like settings to fiddle | 20:21 |
ianw | infra-root: with https://review.opendev.org/c/openstack/project-config/+/798413/1/zuul/main.yaml perhaps we should just move lodgeit to openstack (and remove from opendev)? | 20:22 |
ianw | i should be able to run a vexxhost backup manually, and see if it times out. if so, i can set the net timeouts and try again | 20:24 |
clarkb | that seems liek a reasonable next step for tracking the backup problem down | 20:28 |
ianw | right now, gitea01 can't talk ipv6 to backup02 | 20:30 |
ianw | unless i'm nuts | 20:30 |
ianw | ssh backup02.ca-ymq-1.vexxhost.opendev.org from gitea hangs, "ssh -4" tries to connect | 20:31 |
ianw | however, ping works | 20:31 |
ianw | so somewhere between ping and actual packets something seems up | 20:31 |
fungi | pmtud blackhole maybe? | 20:32 |
clarkb | fungi: https://review.opendev.org/c/opendev/infra-manual/+/797531 lgtm fwiw | 20:34 |
ianw | note we can connect to rax | 20:39 |
ianw | Connecting to backup01.ord.rax.opendev.org [2001:4801:7825:103:be76:4eff:fe10:1b1] port 22 | 20:39 |
clarkb | wouldn't surprise me if that is the problem | 20:39 |
ianw | and i am connected to it via ipv6, so it can talk to me | 20:40 |
ianw | does seem to suggest a vexxhost region<->region ipv6 issue | 20:40 |
opendevreview | Ian Wienand proposed openstack/project-config master: Move opendev/lodgeit to openstack https://review.opendev.org/c/openstack/project-config/+/798413 | 20:58 |
ianw | note mnaser is aware of the ipv6 issue (#vexxhost). i guess we could add "-4" to the borg backup runs | 21:06 |
ianw | actually, if we wanted to do that, we would set AddressFamily in the ssh config for the backup host | 21:09 |
fungi | ianw: oh, looking back at the situation, we saw this long ago in rackspace, where then-newer openssh switched to setting a different qos and thus ipv6 dscp marking, which their routers decided to just drop | 21:12 |
fungi | it happens immediately after authentication completes, so looks an awful lot like mtu problems but isn't | 21:13 |
fungi | no idea if it could be happening in vexxhost, or whether that matches the observed issue | 21:13 |
ianw | hrm, in this case, with ssh -v it appears to get nothing back | 21:13 |
fungi | can you telnet to 22 and get the protocol banner from the sshd? | 21:14 |
ianw | nope | 21:14 |
ianw | just *poof* disappears like magic :) | 21:15 |
fungi | okay, then it's not dscp marking | 21:15 |
fungi | any other open tcp listeners you could test to on the server? | 21:15 |
fungi | this is all within the same region too, right? or is it between regions? | 21:15 |
ianw | telnet to google 80 works with ipv6 | 21:16 |
fungi | just wondering if the server itself had any other ports open so we could rule out it being specific to 22/tcp | 21:16 |
ianw | no, although i guess we could open one | 21:17 |
fungi | maybe if mnaser ends up needing more data | 21:17 |
ianw | gitea are all in sjc right? | 21:17 |
fungi | i thought they were in ca-ymq-1 as well, but maybe not? will double check | 21:18 |
clarkb | gitea are all in sjc1 | 21:18 |
fungi | ahh, no you're right, sjc1 | 21:18 |
ianw | right, so yeah inter-region | 21:19 |
fungi | so this is ipv6 connectivity between ca-ymq-1 and sjc1 | 21:19 |
fungi | which could mean it's a more general problem, triangulation or the like | 21:19 |
fungi | or just some routers with a too-large mask, too-short prefix | 21:20 |
opendevreview | Merged openstack/project-config master: Move opendev/lodgeit to openstack https://review.opendev.org/c/openstack/project-config/+/798413 | 21:20 |
ianw | but ping does get through | 21:20 |
fungi | mmm, yeah | 21:21 |
*** dviroel is now known as dviroel|out | 21:22 | |
fungi | ianw: similarly backup02.ca-ymq-1.vexxhost.opendev.org gets "No route to host" trying to reach 22/tcp on 2604:e100:3:0:f816:3eff:fe16:274 (gitea01) | 21:23 |
fungi | but can ping it | 21:23 |
fungi | interestingly it *can't* ping 2604:e100:3:0:f816:3eff:feef:a51d (gitea08) | 21:25 |
mnaser | that seems strange | 21:25 |
fungi | From 2604:e100:1:0:ce2d:e0ff:fe0f:74af icmp_seq=1 Destination unreachable: Address unreachable | 21:25 |
mnaser | that you can ping but not ssh | 21:25 |
mnaser | whats ip -6 route looking like | 21:25 |
fungi | mnaser: i wonder if the ping and tcp traffic are getting their flows balanced to different routers and one has a stray route on it or something | 21:26 |
mnaser | fungi: that is a pretty reasonable assumption | 21:26 |
fungi | mnaser: http://paste.openstack.org/show/i7ployjupdi8M7ZbRBIH/ | 21:27 |
fungi | so one of the two nexthops for the backup server in ca-ymq-1 is telling it that gitea08's ipv6 address in sjc1 is unreachable | 21:29 |
fungi | maybe that one thinks the address should be local | 21:30 |
fungi | to the site | 21:30 |
fungi | mnaser: also, if it helps, we first noticed ipv6 connection failures between those regions early on june 12, according to the notifications we're generating | 21:34 |
fungi | so whatever happened seems to have been before ~06:00 utc on 2021-06-12 | 21:35 |
fungi | how long before is hard to say | 21:35 |
clarkb | diablo_rojo_phone: the ptgbot change lgtm other than the thing that ianw points out. | 22:00 |
clarkb | ianw: re https://review.opendev.org/c/opendev/zone-opendev.org/+/798242/1/zones/opendev.org/zone.db I think what we ended up doing was setting SSHFP records for the specific hostname to be port 22 with the idea we could set SSHFP records for the CNAME to the port 29418 host keys. I see that you want to have zuul connect to a specific backend to enable which is why your change is | 22:03 |
clarkb | necessary (since zuul will talk to port 29418 not 22 and sshfp is for port 22) | 22:04 |
clarkb | ianw: the other thing that comes to mind doing that is zuul will talk https and ssh to the name in its configuration. Do you know if we'll handle the https side properly? It appears that my browser is fine with it and we don't redirect so I expect it will just work | 22:04 |
clarkb | we also already set the canonical hostname in the zuul config to opendev.org and not review*.opendev.org which means we'll be fine on the job side | 22:05 |
ianw | i don't see why https://review01.opendev.org won't work for zuul (and 02 in time). cert should cover it | 22:06 |
clarkb | yup cert should cover it and without a redirect we won't accidentally redirect to the wrong server when we are transitioning | 22:07 |
clarkb | just talking through it to ensure we aren't missing anything there | 22:07 |
ianw | on the sshfp; we added A/AAAA records for review.opendev.org with the SSHFP of the gerrit service | 22:07 |
ianw | but it all gets quite confusing when we want to reference servers directly in transition periods | 22:08 |
clarkb | ianw: yup review.opendev.org got records for gerrit and review01 got records for openssh-server | 22:08 |
clarkb | and ya if you want to point zuul at review01 then review02 those sshfp records can become problematic | 22:08 |
ianw | honestly, given the marginal benefit they provide, and the chance of confusion, i think we just remove them for review | 22:09 |
ianw | as you say, we more or less expect https and ssh services at the same place, so something like adding "ssh.review01.opendev.org" doesn't even really clarify things | 22:09 |
ianw | (that could have sshfp records for the host, not gerrit, in theory) | 22:10 |
clarkb | ianw: your plan looks good. I did leave some notes though. In particular I think we may not need to do the reindex if we stop both gerrits then copy git repos that should mean the existing indexes are also valid and can simply be copied | 22:16 |
clarkb | that said maybe those indexes are large enough its faster to rebuild them rather than copy them. | 22:16 |
ianw | yeah i feel like generating the indexes is a good way to validate things before starting | 22:17 |
ianw | on the editing of the zuul config, i think we just need it to listen enough to get the message to merge it's own config change | 22:18 |
clarkb | ianw: yup and since the scheduler handles all interaction with gerrit your plan should be fine. I didn't have that luxury with the gearman stuff | 22:19 |
ianw | i don't think the executors or anything else will care. | 22:19 |
clarkb | for some things they definitely do, but I think you are right that they don't in this case | 22:19 |
ianw | i think if we have the "connect to review01" bit sorted out prior, then we can be pretty confident in switching | 22:23 |
ianw | i think this might be the first time we've moved gerrit with the CD approach we have now | 22:24 |
clarkb | yup | 22:24 |
clarkb | to both things (connect to 01 first and be confident and first time with more CD approach) | 22:24 |
ianw | that was what got me a bit stuck when i was like "ok, merge the dns fix", oh, but hang on, zuul is going to merge and apply that, and zuul has to get the message and report back to the right place ... :) | 22:26 |
clarkb | yup I had similar struggles when replacing the zuul scheduler | 22:26 |
clarkb | in that case we actually updated DNS before we took things down iirc | 22:27 |
clarkb | and then they were just wrong for a bit :) | 22:27 |
clarkb | I think your plan should work well in this instance though | 22:27 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 23:33 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] paste service https://review.opendev.org/c/opendev/system-config/+/798400 | 23:46 |
ianw | hrm, any thoughts on how i can get "lodgeit-build-opendev-image" to run in ^ ? | 23:59 |
clarkb | ianw: you need the buildset registry in place | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!