*** ryohayakawa has joined #opendev | 00:00 | |
*** bolg has quit IRC | 00:27 | |
*** ryohayakawa has quit IRC | 00:28 | |
*** ryohayakawa has joined #opendev | 00:29 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Support non-x86_64 DIB_DISTRIBUTION_MIRROR variable for CentOS 7 https://review.opendev.org/740183 | 00:36 |
---|---|---|
kevinz | ianw: ping | 01:55 |
ianw | kevinz: hey | 01:55 |
kevinz | hi ianw, morning! Is it possible to have a local mirror that cached etcd packages in LInaro US? | 01:57 |
kevinz | recently I find the it usually have network connection problem when downloading etcd: https://zuul.opendev.org/t/openstack/build/d6a61543aefb4e919644b2fa6949535f/log/job-output.txt#4948 | 01:57 |
ianw | kevinz: yeah, i've noticed a few weird things from the cloud too, like tls errors connecting to github | 01:59 |
ianw | right, that's the same thing, but we see it in infra downloading ... something from github we do | 02:00 |
kevinz | yes, connection to github usually failed | 02:00 |
ianw | "fatal: unable to access 'https://github.com/infraly/k8s-on-openstack/': gnutls_handshake() failed: Error in the pull function." | 02:01 |
ianw | is what i'm thikning of | 02:01 |
ianw | we can add reverse proxy things ... if connections to github fail so consistently it feels like we're fighting against the network | 02:03 |
clarkb | github is ipv4 only | 02:04 |
clarkb | could it be aproblem with thr NAT? | 02:05 |
kevinz | clarkb: ianw: yes, maybe the problem with the NAT rules. I will double check | 02:06 |
fungi | or just the nat table has too small of a pool of available source ports | 02:15 |
fungi | that's a common cause for such behavior | 02:15 |
kevinz | fungi: cloud you help to clarify? I'm not quite sure about this cases. Thanks :-D | 03:04 |
kevinz | Initiating SSL handshake. | 03:05 |
kevinz | SSL handshake failed. | 03:05 |
kevinz | Closed fd 6 | 03:05 |
kevinz | Unable to establish SSL connection. | 03:05 |
ianw | kevinz: something you'd see on the NAT host i guess ... if it has run out of ports to map back to 443 (or whatever) from it's external address | 03:07 |
ianw | it might be worth a tcpdump if you're on a host exhibiting it, to see what, if anything is coming back | 03:08 |
ianw | but it seems likely the packets aren't getting back to the host | 03:09 |
kevinz | ianw: OK, thanks for the point, I will try to debug it | 03:10 |
auristor | ianw: fyi, https://repology.org/project/kafs-client/versions and the 5.8rc4 kernel kafs passes all of the xfstests suite tests that could be expected to pass | 03:12 |
ianw | auristor: awesome! was that with fscache? iirc that was actually where we saw the most problems | 03:14 |
auristor | the fscache that you had problems with has been disabled in 5.8. | 03:14 |
auristor | I believe the new fscache is targeted for the 5.9 merge window | 03:15 |
ianw | the other thing is that with all that rsync work i guess we've stopped invalidating everything for the cache constantly | 03:17 |
auristor | not entirely. for openafs, it is true that the data isn't invalidated but each release does invalidate all of the cached metadata, the callback state for each cached object, and the cached volume location information and server list. | 03:19 |
auristor | even with the prior rsync behavior the file data versions weren't changing, it was only the metadata. | 03:21 |
ianw | it would be great for us not to have to maintain the openafs package builds we do | 03:24 |
ianw | sigh, speaking of, now i check and the wheels haven't released for a few days. something broken i guess | 03:25 |
ianw | https://zuul.openstack.org/builds?job_name=release-wheel-cache ... you can't really tell *why* it skipped from that :/ | 03:33 |
ianw | 2020-07-19 08:11:06.417530 | wheel-cache-debian-buster-arm64-python2 | mkdir: cannot create directory ‘/afs/.openstack.org/mirror/wheel/debian-10-aarch64//l’: Connection timed out | 03:35 |
ianw | https://zuul.openstack.org/build/8c919da098fb42d1a79308e2e129816d/log/job-output.txt | 03:35 |
ianw | 2020-07-19 06:55:39.121593 | wheel-cache-ubuntu-bionic-arm64-python3 | "msg": "Warning: apt-key output should not be parsed (stdout is not a terminal)\ngpg: keyserver receive failed: End of file" | 03:37 |
ianw | https://zuul.openstack.org/build/5c9c5c6bdfd84485aa91bfc2865011ad/log/job-output.txt | 03:37 |
auristor | Unfortunately, openafs uses 110 for VBUSY which is ETIMEDOUT on Linux. A connection timed out error is often a volume issue and not a connection issue. | 03:37 |
ianw | auristor: yeah, in the context of prior discussion with kevinz though, we're seeing some network issues, particularly ipv4 issues in our arm64 cloud (ipv4 goes via NAT there) | 03:38 |
auristor | VBUSY is also the error the fileserver returns when it cannot establish a reverse connection to the clients callback service port. | 03:39 |
auristor | port mappings and ipv4 address mappings combined with short timeouts on dynamic firewall rules can easily cause problems. | 03:40 |
ianw | i wonder if it is a bug or a feature to not release the volumes if any wheel build fails. on one hand, it keeps the wheel caches consistent. on the other, arm64 issues like this stop publishing. i'm not sure the consistency matters... | 03:40 |
ianw | auristor: yeah, these same nodes are having pretty constant issues talking to github, another ipv4 only service. it does seem to be suggesting something at that nat layer is causing problems | 03:41 |
ianw | afaik we've not had issues with things like cloning from opendev at all, which should all be ipv6 | 03:41 |
auristor | I don't remember what openafs writes to the FileLog but I believe it does write something when there are timeouts during attempts to connect to the callback service. | 03:42 |
kevinz | ianw: looks the netwrok issue is due to virtual router. One of the router netns has not reconstruction after last time upgrading. Now I’ve restart l3 agent to triagger the re-creation progress and re-test, looks the issue disappear | 03:44 |
kevinz | I will triagger the CI jobs to see if it worked | 03:44 |
ianw | kevinz: oh good :) well it's nice to have suspect anyway | 03:45 |
ianw | is it 139.178.85.147 ? that comes up in the logs a lot | 03:46 |
kevinz | ianw: yes, it is the OS-jobs router | 03:47 |
auristor | where does the line get drawn between an infrastructure service such as might be provided by rackspace or another hosting provider and a software product that must be "open source" in order for opendev to use it? | 03:48 |
ianw | Fri Jul 17 08:02:28 2020 CheckHost_r: Probing all interfaces of host 139.178.85.147:23199 failed, code -1 | 03:48 |
ianw | Fri Jul 17 08:13:11 2020 CB: ProbeUuid for host 00007F9A08553D58 (139.178.85.147:12123) failed -1 | 03:48 |
auristor | I would be happy to work with rackspace or other such that auristorfs is provided as a service to opendev to make use of. | 03:48 |
ianw | that's two messages in the openafs server logs around the failure i posted before (08:11) | 03:48 |
auristor | the probing code -1 is RX_CALL_DEAD which means the fileserver didn't receive a response to any DATA or ACK PING packet sent to the callback service port; which will be a public port on the NAT address. | 03:50 |
auristor | Note 23199 is not 7001. | 03:50 |
auristor | Then the probeuuid to port 12123 is an attempt to see if that client has the same UUID as a previously known host that might have switched endpoints. Again, no reply. | 03:53 |
auristor | If the NAT has a VOIP configuration, it should be used for openafs. | 03:53 |
auristor | like afs voip connections require longer lived port mappings and tolerance for longer periods of idleness. | 03:54 |
ianw | http://paste.openstack.org/show/796095/ | 03:55 |
ianw | that's all the error messages for this ip from 10th july | 03:55 |
ianw | ~0800 comes up a lot, i guess because that's the wheel build jobs which are afs users | 03:56 |
auristor | looking at the pattern it appears that the port mapping is good for about 60s and a client retries for approximately five minutes before giving up and marking the fileserver dead. | 03:58 |
auristor | I'm done for the night. | 03:59 |
ianw | auristor: thanks ... i'll keep an eye on things after kevinz's changes and hopefully things just start to work :) | 04:00 |
*** raukadah is now known as chandankumar | 04:04 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS 7 https://review.opendev.org/741868 | 04:05 |
fungi | auristor: the service providers in question are running services which are themselves open source software (at least in theory, but backed up by legal agreements over use of trademarks). we avoid hard dependencies on "freemium" hosted proprietary services (the sort often advertising themselves as "free for use by open source communities" and the like) | 04:16 |
fungi | basically we expect the source code for the services we're relying on to be free/libre open source software, whether we run it or someone else does | 04:17 |
fungi | we don't feel like we can legitimately represent open source development and yet rely on proprietary services to produce it, anything less would be hypocrisy on our part | 04:19 |
*** ysandeep|away is now known as ysandeep | 04:59 | |
*** ysandeep is now known as ysandeep|rover | 04:59 | |
*** DSpider has joined #opendev | 05:31 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS 7 https://review.opendev.org/741868 | 05:58 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: gitea-git-repos: update deprecated API path https://review.opendev.org/741562 | 06:05 |
openstackgerrit | LIU Yulong proposed opendev/irc-meetings master: Change Neutron L3 Sub-team Meeting frequency https://review.opendev.org/741876 | 06:13 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] Drop dib-python requirement from several elements https://review.opendev.org/741877 | 06:16 |
*** marios has joined #opendev | 06:28 | |
openstackgerrit | Merged zuul/zuul-jobs master: Remove copy paste from upload-logs-swift https://review.opendev.org/741840 | 06:41 |
*** qchris has quit IRC | 06:51 | |
*** qchris has joined #opendev | 07:04 | |
*** dtantsur|afk is now known as dtantsur | 07:27 | |
*** tosky has joined #opendev | 07:34 | |
*** dougsz has joined #opendev | 07:46 | |
*** xiaolin has joined #opendev | 07:56 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:03 | |
*** fressi has joined #opendev | 08:10 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS 7 https://review.opendev.org/741868 | 08:24 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] Drop dib-python requirement from several elements https://review.opendev.org/741877 | 08:24 |
*** bolg has joined #opendev | 08:42 | |
*** ysandeep|rover is now known as ysandeep|lunch | 08:48 | |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Bump ansible-lint to speed it up https://review.opendev.org/741897 | 09:27 |
*** sshnaidm|off is now known as sshnaidm | 09:27 | |
*** ysandeep|lunch is now known as ysandeep | 09:30 | |
*** ysandeep is now known as ysandeep|rover | 09:31 | |
openstackgerrit | Merged opendev/irc-meetings master: Update neutron team meeting time https://review.opendev.org/739780 | 09:55 |
*** tosky has quit IRC | 10:08 | |
*** fdegir has quit IRC | 10:09 | |
*** tosky has joined #opendev | 10:10 | |
*** fdegir has joined #opendev | 10:10 | |
*** tkajinam has quit IRC | 10:29 | |
*** dougsz has quit IRC | 10:31 | |
*** dougsz has joined #opendev | 10:46 | |
*** ysandeep|rover is now known as ysandeep|afk | 11:04 | |
*** ysandeep|afk is now known as ysandeep|rover | 11:31 | |
*** weshay_pto is now known as weshay_ | 11:34 | |
*** ryohayakawa has quit IRC | 12:23 | |
*** xiaolin has quit IRC | 13:26 | |
fungi | cacti says etherpad01 has over 900 connections right now (which i guess is around 450 clients) | 13:54 |
clarkb | I think it may be more than 2 connections per client now but I'm not sure of that | 13:54 |
clarkb | but ya its a number of connections | 13:54 |
clarkb | the etherpad itself claims to have 87 | 13:55 |
clarkb | (there are likely other pads in use at any given time though) | 13:55 |
fungi | yeah | 14:00 |
openstackgerrit | Rafael Folco proposed openstack/diskimage-builder master: Pre-install python3 for CentOS 7 https://review.opendev.org/741868 | 14:21 |
clarkb | fungi: well even if a cloud ins't multiarch you could qemu a build as nodepool is doing | 14:38 |
clarkb | and having the resources available "locally" simplifies things | 14:38 |
clarkb | also looks like those packages may not be populated with content? | 14:38 |
fungi | clarkb: yep | 14:39 |
fungi | in reference to emulation | 14:39 |
fungi | i think the directory index there may be generated incorrectly, probably easier to browse via afs | 14:40 |
clarkb | do you see actual content in afs? | 14:40 |
fungi | i'm in the process of checking | 14:40 |
clarkb | step 0 confirm it is actually working (eg packages in the mirror) then add indexes for it to all mirrors then potentially consume it in nodepool jobs then maybe start adding in generic lists of packages to build | 14:41 |
fungi | ahh, yeah, the index is wrong, the tree is sharded by first letter | 14:42 |
clarkb | oh ya this is where we have the apache rewrite rules in the actual mirrors | 14:42 |
fungi | https://static.opendev.org/mirror/wheel/debian-10-aarch64/c/cryptography/ | 14:42 |
*** fressi has quit IRC | 14:42 | |
fungi | so just browsing from static.o.o breaks, but right our mirror vhosts would dtrt | 14:43 |
clarkb | cool so step 0 is done, we can probably add setp 1 now? | 14:43 |
*** mlavalle has joined #opendev | 14:43 | |
fungi | i think so, yeah. i haven't looked at why it's not exposed on our other mirror servers | 14:43 |
clarkb | 2.9.2 is what nodepool is trying to use but on py38 not py37 | 14:43 |
*** fressi has joined #opendev | 14:43 | |
clarkb | we can fairly easily switch nodepool to python3.7 though | 14:44 |
clarkb | except this is for debian python not python on debian | 14:44 |
clarkb | ugh | 14:44 |
clarkb | I wonder if it will still work | 14:44 |
*** ysandeep|rover is now known as ysandeep|away | 14:47 | |
fungi | it probably would | 14:53 |
fungi | though yes we're lacking py38 builds of those wheels it seems | 14:53 |
clarkb | fungi: ya beacuse we're targeting the distro defaults but our containers are python build on top of debian | 14:53 |
clarkb | slightly different expectations between the two | 14:54 |
*** ysandeep|away is now known as ysandeep | 14:54 | |
fungi | yeah, and we'd need debian-bullseye nodes to have python3.8 packages for it | 14:55 |
fungi | or the stow-based ensure-python role | 14:55 |
fungi | or use the ubuntu-bionic wheels, they'd probably work on buster | 14:56 |
fungi | (bionic and focal both have python3.8 packages) | 14:56 |
*** weshay_ is now known as weshay|ruck | 15:12 | |
*** zbr|ruck is now known as zbr|rover | 15:13 | |
*** ysandeep is now known as ysandeep|away | 15:17 | |
*** jgwentworth is now known as melwitt | 15:19 | |
*** xiaolin has joined #opendev | 15:32 | |
*** marios is now known as marios|out | 15:48 | |
*** marios|out has quit IRC | 15:49 | |
openstackgerrit | Merged zuul/zuul-jobs master: add-build-sshkey: Ensure .ssh exists, enable admin authorized_keys https://review.opendev.org/740350 | 15:57 |
*** sshnaidm is now known as sshnaidm|afk | 15:57 | |
fungi | looks like we almost reached 1k established tcp connections on the etherpad server | 16:00 |
fungi | cacti says highest it was was 977 | 16:01 |
*** yoctozepto has quit IRC | 16:12 | |
*** yoctozepto has joined #opendev | 16:13 | |
*** xiaolin has quit IRC | 16:21 | |
*** xiaolin has joined #opendev | 16:22 | |
*** dtantsur is now known as dtantsur|afk | 16:29 | |
*** xiaolin has quit IRC | 16:40 | |
weshay|ruck | what do folks think of py3 on centos7 https://review.opendev.org/#/c/741868/ | 17:01 |
clarkb | weshay|ruck: I don't think we should preinstall it via dib in the yum element, but I think jobs can install it if they want it | 17:02 |
clarkb | we've finally gotten to a point where we don't preinstall a bunch of extra stuff which causes confusion and problems and I think adding python3 to centos7 in dib would get us back into that situation for some things | 17:03 |
clarkb | but in the jobs you can definitely install it | 17:03 |
weshay|ruck | ah.. I see | 17:03 |
weshay|ruck | fair point | 17:03 |
*** dougsz has quit IRC | 17:12 | |
*** qchris has quit IRC | 18:37 | |
*** qchris has joined #opendev | 18:40 | |
*** tosky has quit IRC | 18:57 | |
clarkb | ok going to send https://etherpad.opendev.org/p/E6m-M-3fTLwse2RkrDQL in a moment | 19:56 |
fungi | thanks | 19:57 |
fungi | for a moment i thought we might still be supporting git:// protocol, but a quick check confirms 9418/tcp isn't open | 19:59 |
clarkb | and followup thread on advisory board sent | 20:16 |
clarkb | infra-root if anyone else is able (thanks fungi for early review) landing https://review.opendev.org/#/c/741277/ will enable us to make a gerritlib release to support https://review.opendev.org/#/c/741279/2. That second change also tests the first one via depends on and it seems to work | 20:20 |
*** shtepanie has joined #opendev | 20:31 | |
openstackgerrit | Merged zuul/zuul-jobs master: Enable tls-proxy in ensure-devstack https://review.opendev.org/741820 | 20:49 |
ianw | clarkb: the problem is that we've started assuming python3 in some of the in-chroot tools; centos7 is the only distro that doesn't have a /usr/bin/python3 | 22:21 |
ianw | so since it's part of the distro now, it seems reasonable to include it and thus allow dib to be python3 only without constraints | 22:21 |
clarkb | hrm | 22:22 |
clarkb | I see its the actual build env that hits it | 22:22 |
clarkb | ianw: should we maybe have a cleanup that removes it after? | 22:22 |
clarkb | then we don't pollute the final result but dib chroot things can run? | 22:22 |
ianw | tbh i don't see using the packaged python3 really as pollution at this point | 22:23 |
clarkb | ya I guess if tripleo doesn't mind then its probably ok, they are the group I would expect to hvae issues with it. | 22:24 |
ianw | i did push back on installing things like pyyaml with pip on the base image -- *that* i consider pollution after we went to a lot of effort to remove non-packaged components | 22:24 |
clarkb | python, pip, etc are all "namespaced" properly with python3 pip3 etc under centos7 ya? | 22:24 |
clarkb | so we won't accidentally flip those over to python3 with people being surprised | 22:24 |
fungi | it is insofar as that if that platform doesn't normally ship python3 but something runs expecting python3 in a job, the job can pass without explicitly installing the package | 22:24 |
fungi | making it slightly less portable | 22:25 |
clarkb | fungi: ya but it won't run a thing expecting python2 and get python3 and test the wrong thing | 22:25 |
fungi | "this job works on centos 7 images, but oh yeah only if you remember to make sure you preinstall python3" | 22:25 |
clarkb | ianw: ^ do you think that case would be sufficient to have dib clean up after itself? | 22:26 |
ianw | this is true; the problem with cleanup is always if there's another element that installs it and then we go and remove thigns on them | 22:27 |
clarkb | gotcha | 22:27 |
fungi | also a great point | 22:27 |
ianw | i feel like practically, there's not new development going on in centos 7, it's more a situation of maintaining old branches. so as mentioned, it's like a xenial situation where "python" is python2 | 22:29 |
clarkb | ya I looked at the chnage and just didn't make the connection it was the in chroot scripts themselves that needed it | 22:31 |
clarkb | I think given that the simplest thing to do is likely what you have proposed, then when centos7 is something we can stop caring about it goes away | 22:31 |
ianw | it doesn't hit in the gate because i think it was svc-map or something that doesn't get called that got updated | 22:33 |
ianw | in theory it allows us to drop dib-python (https://review.opendev.org/#/c/741877/2) but i need to loko into that | 22:33 |
*** shtepanie has quit IRC | 22:41 | |
ianw | clarkb: if you have a sec, the borg backup @ https://review.opendev.org/#/c/741366/ is ready for review | 22:47 |
ianw | to the dib/centos7 thing, it's more compelling if the cleanup that follows is working. i'll look into that today | 22:48 |
*** tkajinam has joined #opendev | 22:53 | |
clarkb | ianw: ya I'll try to take a look though I'm fading fast. I had a very early morning | 22:53 |
ianw | no probs | 22:54 |
*** mlavalle has quit IRC | 22:59 | |
clarkb | left some notes, overall looks good but there are some minor things here and there | 23:11 |
*** DSpider has quit IRC | 23:15 | |
*** sgw1 has quit IRC | 23:16 | |
*** sgw1 has joined #opendev | 23:25 | |
ianw | thanks, will loop back | 23:31 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!