Tuesday, 2021-06-29

opendevreviewMerged openstack/project-config master: Add Ceph Dashboard charm to OpenStack charms  https://review.opendev.org/c/openstack/project-config/+/79791100:37
opendevreviewMerged openstack/project-config master: Add members of the neutron drivers team as ops in neutron channel  https://review.opendev.org/c/openstack/project-config/+/79652100:43
opendevreviewMerged openstack/project-config master: Adding missing zuul for etcd3gw  https://review.opendev.org/c/openstack/project-config/+/79582400:43
opendevreviewMerged openstack/project-config master: Add project templates to rbd-iscsi-client  https://review.opendev.org/c/openstack/project-config/+/79834800:43
*** ysandeep|away is now known as ysandeep01:19
*** diablo_rojo is now known as Guest85701:24
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840001:57
*** ysandeep is now known as ysandeep|afk02:00
opendevreviewMerged openstack/project-config master: Remove scientific-wg from infra  https://review.opendev.org/c/openstack/project-config/+/79774702:11
opendevreviewMerged openstack/project-config master: Add noop job to remove content of puppet-openstack-specs repo  https://review.opendev.org/c/openstack/project-config/+/79839202:19
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840002:28
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840002:48
opendevreviewMerged openstack/project-config master: End project gate for puppet-openstack-specs  https://review.opendev.org/c/openstack/project-config/+/79839303:01
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840003:12
opendevreviewIan Wienand proposed opendev/lodgeit master: Add mariadb requirements  https://review.opendev.org/c/opendev/lodgeit/+/79841104:27
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840004:30
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840004:36
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840004:38
opendevreviewIan Wienand proposed openstack/project-config master: Add opendev/lodgeit to openstack  https://review.opendev.org/c/openstack/project-config/+/79841304:50
*** marios is now known as marios|ruck05:12
*** ysandeep|afk is now known as ysandeep05:51
opendevreviewOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/79841806:10
opendevreviewIan Wienand proposed opendev/lodgeit master: Allow for overriding title  https://review.opendev.org/c/opendev/lodgeit/+/79841906:15
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840006:23
*** jpena|off is now known as jpena07:00
opendevreviewMerged openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/79841807:17
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840007:31
opendevreviewIan Wienand proposed opendev/lodgeit master: Allow for overriding title  https://review.opendev.org/c/opendev/lodgeit/+/79841907:36
opendevreviewIan Wienand proposed opendev/lodgeit master: Use bionic for Python 3.6  https://review.opendev.org/c/opendev/lodgeit/+/79842807:40
opendevreviewIan Wienand proposed opendev/lodgeit master: Add py3.8 support  https://review.opendev.org/c/opendev/lodgeit/+/77347907:42
opendevreviewIan Wienand proposed opendev/lodgeit master: Add py3.8 support  https://review.opendev.org/c/opendev/lodgeit/+/77347907:43
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840008:04
opendevreviewMerged opendev/lodgeit master: Use bionic for Python 3.6  https://review.opendev.org/c/opendev/lodgeit/+/79842808:12
opendevreviewMerged opendev/lodgeit master: Add py3.8 support  https://review.opendev.org/c/opendev/lodgeit/+/77347908:13
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: Add a meta log upload role with a failover mechanism  https://review.opendev.org/c/zuul/zuul-jobs/+/79533608:15
*** ykarel is now known as ykarel|lunch08:38
opendevreviewDmitriy Rabotyagov proposed opendev/lodgeit master: Add mariadb requirements  https://review.opendev.org/c/opendev/lodgeit/+/79841108:39
opendevreviewRico Lin proposed openstack/project-config master: Retired js-openstack-lib  https://review.opendev.org/c/openstack/project-config/+/79852908:40
opendevreviewDmitriy Rabotyagov proposed opendev/lodgeit master: Redesign manage.py to not use deprecated werkzeug.script  https://review.opendev.org/c/opendev/lodgeit/+/69337808:40
opendevreviewDmitriy Rabotyagov proposed opendev/lodgeit master: Allow for overriding title  https://review.opendev.org/c/opendev/lodgeit/+/79841909:08
*** bhagyashris_ is now known as bhagyashris09:15
opendevreviewIan Wienand proposed opendev/lodgeit master: Retry on initial table creation failure  https://review.opendev.org/c/opendev/lodgeit/+/79862009:21
opendevreviewIan Wienand proposed opendev/lodgeit master: Allow for overriding title  https://review.opendev.org/c/opendev/lodgeit/+/79841909:26
opendevreviewIan Wienand proposed opendev/lodgeit master: Retry on initial table creation failure  https://review.opendev.org/c/opendev/lodgeit/+/79862009:26
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840009:39
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: Add a meta log upload role with a failover mechanism  https://review.opendev.org/c/zuul/zuul-jobs/+/79533609:46
*** ykarel|lunch is now known as ykarel10:32
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840010:53
dtantsurHi folks! I'm pondering adding a few words about release notes after https://docs.opendev.org/opendev/infra-manual/latest/developers.html#starting-a-change. WDYT?11:00
*** jpena is now known as jpena|lunch11:34
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840011:53
fricklerdtantsur: I think mentioning the option of including both release notes and documentation updates might be helpful, just be aware that not every project might have them, that document is generic for opendev, not specific for openstack12:23
dtantsurfrickler: yeah, I was thinking about something like "Some projects have release note associated with each user-visible change. If you see a releastenotes directory in the source tree, you can add a release note with::"12:31
fricklerdtantsur: sounds good to me, likely worth it's own section then, like the one about signed-off?12:33
* dtantsur looks for the source code of the guide12:34
dtantsuraha, https://opendev.org/opendev/infra-manual, right?12:34
*** jpena|lunch is now known as jpena12:36
opendevreviewDmitry Tantsur proposed opendev/infra-manual master: Developers: mention release notes and documentation  https://review.opendev.org/c/opendev/infra-manual/+/79865512:46
dtantsurokay, something like this ^^^12:46
opendevreviewDmitry Tantsur proposed ttygroup/gertty master: examples: matching storyboard stories  https://review.opendev.org/c/ttygroup/gertty/+/79867813:04
fungidtantsur: i'm on the fence about whether reno guidance should be in the openstack contributor guide instead, and there's already https://docs.openstack.org/project-team-guide/release-management.html#managing-release-notes13:04
dtantsurfungi: "Release Management" is definitely not what newcomers read13:05
dtantsurwe direct them to https://docs.opendev.org/opendev/infra-manual/latest/developers.html, is the the wrong place?13:05
fungidtantsur: agreed, i feel like that section is misplaced13:05
*** amoralej|off is now known as amoralej13:05
dtantsurnote that not only openstack projects have release notes. zuul seems to as well.13:05
fungidtantsur: i thought we directed openstack contributors to https://docs.openstack.org/contributors/code-and-documentation/index.html13:05
fungiit references the opendev infra manual in places to avoid being too redundant13:06
dtantsuryeah, it's pretty much just links to the infra guide. "Push your change" is very short :)13:06
dtantsurmy goal is to make newcomers aware of release notes before I have to leave a -1 or do it for them13:07
fungiwell, there's a section in the contributor guide on documentation it might fit in13:07
dtantsurI'm no longer a newcomer, but I suspect people won't immediately check the documentation section13:08
dtantsurjust a gut feeling, of course.. but I suspect people don't go very far beyond the first page13:08
fungier, to rephrase, it might fit in the documentation section of the of the contributor guide, i meant13:08
dtantsuryou mean https://docs.openstack.org/contributors/code-and-documentation/documentation.html?13:08
dtantsurwhat I like about https://docs.opendev.org/opendev/infra-manual/latest/developers.html is that it's pretty much a one-stop shop for contributing a small change13:09
fungidtantsur: yeah, it's not an easy choice. i just worry that the opendev contributor guide is already too large, and we should try to remove more references to project-specific guidance rather than increasing it13:09
dtantsur(I just had to direct a person somewhere, and I've chosen this page)13:09
dtantsurthen we probably need to create a similar page in the openstack guide13:10
dtantsurbecause if we just link to the infra guide, people will just follow the infra guide13:10
fungithe goal of the opendev infrastructure guide is to be about infrastructure and tools opendev operates and supplies, while reno is an openstack tool (albeit one used by a number of non-openstack projects too)13:10
fungii'm not strongly opposed, just not sure quite where to draw the line and strike a balance between these different sorts of guides13:11
fungiwe've already enlisted the contributor guide maintainers to help forklift more openstackish bits out of the infra manual so we can simplify it13:12
dtantsurI see your concern. I'm not sure either.13:12
mordredwe need a new name for a type of thing here ...13:14
dtantsurnaming is easy13:14
* dtantsur ducks13:14
mordredbecause there are a set of thigns that openstack uses - but that other things also use, and that collectively form a nice and consistent way of working13:15
mordredbut that are also not opendev13:15
mordredso being able to say "here's how you work on a project that has chosen to use this set of tools" without needing to determine whether you are or are not openstack could be helpful?13:16
mordreddtantsur: just don't let ME name the thing13:16
dtantsurWhat you suggest is pretty much what I've tried to do within the opendev guide13:16
dtantsur"If you see a releasenotes directory, create a release note"13:16
mordredpbr, bindep, reno come to mind as parts of the in-tree version of this13:16
mordredmakes sense13:17
dtantsurbut fungi is concerned about making this guide too specific13:17
mordredyah - it's a fair point13:17
fungithere's also the argument that if a project is relying on bindep, or reno, or pbr... they should put relevant information about that in their own project-specific contributor guide. those are things not specific to being hosted in opendev, they're tools any project anywhere could use, and there are plenty more... if you see a yarn.lock file in a project you may need to update that when you're13:20
fungimaking changes, but i'm not sure we would document that in the infra manual13:20
dtantsurmaybe in the end we should just polish our (ironic's) contributor guide..13:20
fungithat a project uses bindep, or pbr, or reno is fundamentally no different from the fact that it uses yarn13:21
mordredtotally - but one could argue that there are set of tools that while not strictly endemic to opendev projects, are commonly found here and are not commonly found elsewhere13:21
mordred(I agree with the point though - I'm still just having morning tea and chatting)13:21
mordredbut largely I think I fall more on the "these aren't opendev specific" - but I do think it would handy to be able to name a type of project that uses those things - because there is an underlying philosophy to all of those that might be useful also to document somewhere13:23
mordredsort of like how gating.dev should document the idea of gating and the background idea13:23
fungiwith my openstack hat on, i feel like the contributor guide and project CONTRIBUTING.rst files could stand to link to https://docs.openstack.org/project-team-guide/release-management.html#how-to-add-new-release-notes13:25
fungifor non-openstack projects who are looking for a more generic url to include in their project-specific contributor guidance, linking to https://docs.openstack.org/reno/latest/user/usage.html would make sense13:25
mordredreno, pbr and bindep are all pieces of implementing some philosophical stances about a way to manage a project. "all the info should be in git and should work, things that are in git should not also be in text in files, in-project tooling needs to be able to handle concurrent non-sequenced merges"13:26
fungialso human-first, self-documenting data which as a side effect happens to be machine parseable so it can be leveraged by automated systems like project gating ci13:27
fungiyou're right, we do have a lot of opinions about how software should be developed ;)13:28
fungianyway, back to the original topic, i'm not -1/-2 on 798655, i'm just also not +1/+2 on it without a bit more debate13:30
mordredfungi: ++ to human-first, self-documenting ...13:32
* dtantsur is very sad that `git review -d` requires authentication..13:38
fungidtantsur: it could be made not to, but will require using https13:39
dtantsurthis is fine, I just don't want to enroll my testing VMs into gerrit13:39
dtantsurdo you know the flag (?) to use https?13:40
dtantsurah, we probably have ssh hardcoded in .gitreview..13:40
fungidtantsur: when i said it could be made to, i mean i think it would need a patch to not rely on auth while using https, but it's worth a try13:41
fungialso ssh won't be hard-coded in .gitreview, it's just the default and there are two or three ways you can override it locally13:42
dtantsurack, thanks13:42
fungii'm checking the docs for specifics now13:43
fungi~/.config/git-review/git-review.conf or /etc/git-review/git-review.conf13:43
fungithe manpage says you can set gitreview.scheme to https (default is ssh)13:45
dtantsurI see. Yeah, makes sense.13:46
fungianyway, if you try that and it still demands login info, then it may be worth improving git-review to bypass its auth steps for specific operations we know can be performed anonymously13:47
toskyfungi: if this is all about setting the 'gerrit' remote, couldn't the pull remote be https, while keeping the push one as ssh?13:47
fungitosky: if git-review were patched to do that, yes. specifically this is about fetching review refs, which aren't necessarily going to be replicated to your origin remote though13:48
fungibut yeah you could probably have the gerrit remote use https for pull and ssh for push13:49
fungialso some gerrits will require authentication even for operations ours allows to be anonymous, so probably git-review will need a switch to be able to set it to always auth so that it can continue to support those sites13:50
fungidtantsur: tosky: it does appear that the fetch_review() function may work anonymously over https, just skimming the source code13:54
fungialso run_http_exc() seems to try anonymous first and then fall back to applying credentials if it gets a 40113:56
fungiso, yeah, this looks like it at least wants to work the way we hope13:56
fungiso if it turns out not to, that's a bug worth fixing13:58
fungiand the rest api urls it assembles don't seem to assume the expressly authenticated endpoint (a/)14:02
dtantsuryeah, but it defaults to ssh nonetheless14:04
mordredit might be worth considering making https the default and ssh the fallback (although getting the https auth info the first time is a bit lame) ... the upstream gerrit folks are really pushing using https for everything and most of them recommend against running the ssh server at all. having git-review default to https would be in keeping with upstream direction - although we'd want to think long and hard about how that would or would not14:07
mordredimpact our community14:07
fungiyes, most users of our gerrit have not generated http passwords, and many probably don't have mechanisms set up to securely store them in ways git/git-review can access14:23
fungigit-review will use the git credential store if you've configured it, but i'll wager most users eyes gloss over when you ask if they've set it up14:24
*** ysandeep is now known as ysandeep|dinner14:56
clarkbfungi: mordred  dtantsur the opendev docs should be focused on getting code into the system and interacting with zuul etc. Not be too concerned with "best practices" that are likely to vary between projects15:04
clarkbdtantsur: re git review -d gerrit gives you the fetch and checkout command in the web ui if you want an easy copy paste15:06
clarkbI use that when not in an authenticated context and use git review when I am15:06
dtantsurclarkb: yep, that's what I've used so far, but I was curious to try something potentially faster15:06
clarkbmordred: I would be a strong -2 against making https default15:06
clarkbmordred: https credentials are significantly more painful to manage both as a user and an operator of gerrit15:07
clarkbwe give users the ability to flip that default in the tool config if they choose but as far as a default at top level that is probably not a great choice15:07
clarkbits so bad that gerrit gerrit literally has you do the equivalent of curl | bash to configure it15:08
clarkbwhich is not great15:08
*** ykarel is now known as ykarel|away15:28
mordredyeah - it's horrible. I'm just saying - no-ssh is a thing upstream keeps pushing 15:39
*** ysandeep|dinner is now known as ysandeep15:41
*** ysandeep is now known as ysandeep|away15:44
corvusmordred: google is certainly pushing it, since they can't run it... but i feel like other users are pretty happy with the ssh stuff and aren't in a rush to deprecate it.  having said that, non-ssh could be advantageous for some openstack contributor companies that have issues with ssh15:49
*** amoralej is now known as amoralej|off15:57
*** jpena is now known as jpena|off15:59
clarkbinfra-root https://etherpad.opendev.org/p/fM064G15GPb-wwJx5mLw how does that look for reaching out to corvus' EMS contact?16:00
clarkbI need to catch up on some morning thinsg now, but can send that out this morning if it looks good to y'all16:03
clarkbcorvus: ^ in particular may want to check it16:03
*** dviroel is now known as dviroel|lunch16:03
*** diablo_rojo__ is now known as diablo_rojo16:04
corvusclarkb: lgtm -- mordred anything else you can think of?16:05
fungiclarkb: i think it's good16:12
corvusclarkb: did we talk about mjolnir?16:14
*** marios|ruck is now known as marios|out16:17
fungithor's hammer?16:18
fungithat's a heavy subject16:18
clarkbcorvus: it was briefly mentioned as the moderation tool?16:19
fungiaha, so the reference is to a large banhammer16:20
corvusclarkb: yeah, we may want one of the slightly higher tier plans so that it can run server-side and federate with other systems.  that's probably more important for projects > zuul; for zuul only i don't think it's a big deal.16:29
corvusclarkb: just wanted to make sure you had background on that; i couldn't remember if we discussed it16:30
corvusthough looking at the pricing chart, i'm not sure about what tier would be required.  anyway -- something to keep in mind.16:34
corvus(nickel and silver both say "shared moderation bot (soon)" so i dunno16:35
clarkbcorvus: ya I figured we could be a little vague and let them fill us in16:43
clarkbre "federate with other systems" is that not in the lower tiers either?16:43
clarkbI'll go ahead and reach out to EMS now and see what they say16:45
clarkbI expect they will be more than happy to give us details16:45
corvusclarkb: federating chat is universal -- mjolnir is a federated moderation system16:47
clarkbgot it16:47
corvusif you run it server side, then you can subscribe to ban lists from other communities16:47
corvusso it's like what we wanted to do with limnoria, but across a much larger aria16:48
corvus * so it's like what we wanted to do with limnoria, but across a much larger area16:48
*** dviroel|lunch is now known as dviroel17:08
clarkbEMS contact has responded. They are in the UK and would like to schedule a call sometime this week to talk. We'll sort that out tomorrow since their day has already largely ended17:17
fungisounds great17:31
melwittclarkb, fungi: thank you for the reviews on update_blueprint! here's the matching system-config change if we're ready to re-enable it https://review.opendev.org/c/opendev/system-config/+/79591418:03
mordredclarkb: sorry for the delay - the etherpad looks good - which isn't useful now :)19:07
ianwhttps://dev.mysql.com/doc/refman/8.0/en/mysqldump.html#option_mysqldump_network-timeout : Enable large tables to be dumped by setting --max-allowed-packet to its maximum value and network read and write timeouts to a large value. This option is enabled by default. To disable it, use --skip-network-timeout. 19:41
fungiahh, still i wonder why it's just affecting the nearer backup target19:42
ianwhrm, we run a db container there right?19:44
clarkbianw: correct the db is on localhost in a container19:44
ianwthis might be something similar to low timeout values i saw yesterday when bringing up paste in ansible19:44
ianw(which i wanted to do for precisely the reason that it seemed like an easy way to get some mariadb container production experience)19:45
ianw"By default, the server closes the connection after eight hours if nothing has happened. You can change the time limit by setting the wait_timeout variable when you start mysqld. See Section 5.1.8, “Server System Variables”. "19:45
ianwi think the containers might have that much lower, like 5 minutes19:46
ianwi think there's some query we can run to check19:46
opendevreviewMerged opendev/system-config master: Re-enable update_blueprint for patchset-created  https://review.opendev.org/c/opendev/system-config/+/79591419:54
ianwdoes load for anyone else?19:57
clarkbit doesn't seem to be returning for me19:58
fungimy browser doesn't like the cert, but then hangs loading once i accept it19:58
clarkbyup same19:58
clarkbthe root doesn't load either19:59
fungithere is also a longstanding issue i've attempted multiple times to address through timeout changes in the trove config, looking into the timeout/retry settings sqla implements for its sockets by default, et cetera19:59
fungiit's supposed to, when it receives a disconnect error, immediately send a sql ping to force reconnection, but it acts like that's not working20:00
fungipossible something about the way the connection is dropping makes it not detect it's dead, i'm not sure20:00
ianwstrace: Process 23714 attached20:01
ianwrecvfrom(6, ^Cstrace: Process 23714 detached20:01
ianwuwsgi   23714 nobody    6u     IPv4             342454      0t0     TCP localhost:59382->localhost:mysql (ESTABLISHED)20:01
ianwit looks to me that it's stuck on the mysql socket20:01
ianw2021-06-29 17:37:24 3 [Warning] Aborted connection 3 to db: 'lodgeit' user: 'lodgeit' host: '' (Got timeout reading communication packets)20:02
ianwthis maps with your ping theory20:02
fungithe production one does similarly, after some time sitting unused the next request takes forever and never competes, but then some minutes later everything is reesatblished20:02
fungiit's probably less apparent in production because the server is hit more continuously20:03
ianwnote that we do not set pool_pre_ping in the code20:03
ianwi was looking at that yesterday20:03
ianw| wait_timeout  | 28800 |20:07
ianwthat's on gitea mariadb container20:07
ianwso that looks like our 8 hours20:07
fungiso if the server doesn't get hit for ~8 hours, we should see the issue recur on the subsequent requests20:08
ianwsorry, i'm switching back and forth, that's the timeout on the gitea mariadb container20:09
ianwi.e. i can't see that particular timeout being the reason the backup connection gets dropped20:10
ianwYou can obtain more information about lost connections by starting mysqld with the log_error_verbosity system variable set to 3. This logs some of the disconnection messages in the hostname.err file. See Section 5.4.2, “The Error Log”. 20:11
ianwmight be an option20:11
ianwthe container has been up for 4 days, so it's not like it got restarted on us20:12
ianwun 29 05:54:54 gitea01 docker-mariadb[704]: 2021-06-29  5:54:54 128099 [Warning] Aborted connection 128099 to db: 'gitea' user: 'root' host: 'localhost' (Got an error writing communication packets)20:16
ianwnet_read_timeout and net_write_timeout seem like settings to fiddle20:21
ianwinfra-root: with https://review.opendev.org/c/openstack/project-config/+/798413/1/zuul/main.yaml perhaps we should just move lodgeit to openstack (and remove from opendev)?20:22
ianwi should be able to run a vexxhost backup manually, and see if it times out.  if so, i can set the net timeouts and try again20:24
clarkbthat seems liek a reasonable next step for tracking the backup problem down20:28
ianwright now, gitea01 can't talk ipv6 to backup0220:30
ianwunless i'm nuts20:30
ianwssh  backup02.ca-ymq-1.vexxhost.opendev.org from gitea hangs, "ssh -4" tries to connect20:31
ianwhowever, ping works20:31
ianwso somewhere between ping and actual packets something seems up20:31
fungipmtud blackhole maybe?20:32
clarkbfungi: https://review.opendev.org/c/opendev/infra-manual/+/797531 lgtm fwiw20:34
ianwnote we can connect to rax20:39
ianwConnecting to backup01.ord.rax.opendev.org [2001:4801:7825:103:be76:4eff:fe10:1b1] port 2220:39
clarkbwouldn't surprise me if that is the problem20:39
ianwand i am connected to it via ipv6, so it can talk to me20:40
ianwdoes seem to suggest a vexxhost region<->region ipv6 issue20:40
opendevreviewIan Wienand proposed openstack/project-config master: Move opendev/lodgeit to openstack  https://review.opendev.org/c/openstack/project-config/+/79841320:58
ianwnote mnaser is aware of the ipv6 issue (#vexxhost).  i guess we could add "-4" to the borg backup runs21:06
ianwactually, if we wanted to do that, we would set AddressFamily in the ssh config for the backup host21:09
fungiianw: oh, looking back at the situation, we saw this long ago in rackspace, where then-newer openssh switched to setting a different qos and thus ipv6 dscp marking, which their routers decided to just drop21:12
fungiit happens immediately after authentication completes, so looks an awful lot like mtu problems but isn't21:13
fungino idea if it could be happening in vexxhost, or whether that matches the observed issue21:13
ianwhrm, in this case, with ssh -v it appears to get nothing back21:13
fungican you telnet to 22 and get the protocol banner from the sshd?21:14
ianwjust *poof* disappears like magic :)21:15
fungiokay, then it's not dscp marking21:15
fungiany other open tcp listeners you could test to on the server?21:15
fungithis is all within the same region too, right? or is it between regions?21:15
ianwtelnet to google 80 works with ipv621:16
fungijust wondering if the server itself had any other ports open so we could rule out it being specific to 22/tcp21:16
ianwno, although i guess we could open one21:17
fungimaybe if mnaser ends up needing more data21:17
ianwgitea are all in sjc right?21:17
fungii thought they were in ca-ymq-1 as well, but maybe not? will double check21:18
clarkbgitea are all in sjc121:18
fungiahh, no you're right, sjc121:18
ianwright, so yeah inter-region21:19
fungiso this is ipv6 connectivity between ca-ymq-1 and sjc121:19
fungiwhich could mean it's a more general problem, triangulation or the like21:19
fungior just some routers with a too-large mask, too-short prefix21:20
opendevreviewMerged openstack/project-config master: Move opendev/lodgeit to openstack  https://review.opendev.org/c/openstack/project-config/+/79841321:20
ianwbut ping does get through21:20
fungimmm, yeah21:21
*** dviroel is now known as dviroel|out21:22
fungiianw: similarly backup02.ca-ymq-1.vexxhost.opendev.org gets "No route to host" trying to reach 22/tcp on 2604:e100:3:0:f816:3eff:fe16:274 (gitea01)21:23
fungibut can ping it21:23
fungiinterestingly it *can't* ping 2604:e100:3:0:f816:3eff:feef:a51d (gitea08)21:25
mnaserthat seems strange21:25
fungiFrom 2604:e100:1:0:ce2d:e0ff:fe0f:74af icmp_seq=1 Destination unreachable: Address unreachable21:25
mnaserthat you can ping but not ssh21:25
mnaserwhats ip -6 route looking like21:25
fungimnaser: i wonder if the ping and tcp traffic are getting their flows balanced to different routers and one has a stray route on it or something21:26
mnaserfungi: that is a pretty reasonable assumption21:26
fungimnaser: http://paste.openstack.org/show/i7ployjupdi8M7ZbRBIH/21:27
fungiso one of the two nexthops for the backup server in ca-ymq-1 is telling it that gitea08's ipv6 address in sjc1 is unreachable21:29
fungimaybe that one thinks the address should be local21:30
fungito the site21:30
fungimnaser: also, if it helps, we first noticed ipv6 connection failures between those regions early on june 12, according to the notifications we're generating21:34
fungiso whatever happened seems to have been before ~06:00 utc on 2021-06-1221:35
fungihow long before is hard to say21:35
clarkbdiablo_rojo_phone: the ptgbot change lgtm other than the thing that ianw points out.22:00
clarkbianw: re https://review.opendev.org/c/opendev/zone-opendev.org/+/798242/1/zones/opendev.org/zone.db I think what we ended up doing was setting SSHFP records for the specific hostname to be port 22 with the idea we could set SSHFP records for the CNAME to the port 29418 host keys. I see that you want to have zuul connect to a specific backend to enable which is why your change is22:03
clarkbnecessary (since zuul will talk to port 29418 not 22 and sshfp is for port 22)22:04
clarkbianw: the other thing that comes to mind doing that is zuul will talk https and ssh to the name in its configuration. Do you know if we'll handle the https side properly? It appears that my browser is fine with it and we don't redirect so I expect it will just work22:04
clarkbwe also already set the canonical hostname in the zuul config to opendev.org and not review*.opendev.org which means we'll be fine on the job side22:05
ianwi don't see why https://review01.opendev.org won't work for zuul (and 02 in time).  cert should cover it22:06
clarkbyup cert should cover it and without a redirect we won't accidentally redirect to the wrong server when we are transitioning22:07
clarkbjust talking through it to ensure we aren't missing anything there22:07
ianwon the sshfp; we added A/AAAA records for review.opendev.org with the SSHFP of the gerrit service22:07
ianwbut it all gets quite confusing when we want to reference servers directly in transition periods22:08
clarkbianw: yup review.opendev.org got records for gerrit and review01 got records for openssh-server22:08
clarkband ya if you want to point zuul at review01 then review02 those sshfp records can become problematic22:08
ianwhonestly, given the marginal benefit they provide, and the chance of confusion, i think we just remove them for review22:09
ianwas you say, we more or less expect https and ssh services at the same place, so something like adding "ssh.review01.opendev.org" doesn't even really clarify things22:09
ianw(that could have sshfp records for the host, not gerrit, in theory)22:10
clarkbianw: your plan looks good. I did leave some notes though. In particular I think we may not need to do the reindex if we stop both gerrits then copy git repos that should mean the existing indexes are also valid and can simply be copied22:16
clarkbthat said maybe those indexes are large enough its faster to rebuild them rather than copy them.22:16
ianwyeah i feel like generating the indexes is a good way to validate things before starting22:17
ianwon the editing of the zuul config, i think we just need it to listen enough to get the message to merge it's own config change22:18
clarkbianw: yup and since the scheduler handles all interaction with gerrit your plan should be fine. I didn't have that luxury with the gearman stuff22:19
ianwi don't think the executors or anything else will care.  22:19
clarkbfor some things they definitely do, but I think you are right that they don't in this case22:19
ianwi think if we have the "connect to review01" bit sorted out prior, then we can be pretty confident in switching 22:23
ianwi think this might be the first time we've moved gerrit with the CD approach we have now22:24
clarkbto both things (connect to 01 first and be confident and first time with more CD approach)22:24
ianwthat was what got me a bit stuck when i was like "ok, merge the dns fix", oh, but hang on, zuul is going to merge and apply that, and zuul has to get the message and report back to the right place ... :)22:26
clarkbyup I had similar struggles when replacing the zuul scheduler22:26
clarkbin that case we actually updated DNS before we took things down iirc22:27
clarkband then they were just wrong for a bit :)22:27
clarkbI think your plan should work well in this instance though22:27
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840023:33
opendevreviewIan Wienand proposed opendev/system-config master: [wip] paste service  https://review.opendev.org/c/opendev/system-config/+/79840023:46
ianwhrm, any thoughts on how i can get "lodgeit-build-opendev-image" to run in ^ ?23:59
clarkbianw: you need the buildset registry in place23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!