*** dviroel|afk is now known as dviroel|out | 00:37 | |
*** rlandy is now known as rlandy|out | 01:09 | |
*** undefined is now known as Guest5598 | 03:35 | |
*** ysandeep|out is now known as ysandeep | 05:00 | |
*** ysandeep is now known as ysandeep|lunch | 07:35 | |
*** ysandeep|lunch is now known as ysandeep | 09:59 | |
*** rlandy|out is now known as rlandy | 10:32 | |
*** ysandeep is now known as ysandeep|afk | 10:53 | |
*** dviroel|out is now known as dviroel | 11:24 | |
fungi | pip 22.2 has just been released | 11:55 |
---|---|---|
opendevreview | Merged opendev/system-config master: add computing force network mailling list for computing force network working group https://review.opendev.org/c/opendev/system-config/+/850268 | 12:31 |
*** dasm|off is now known as dasm|ruck | 13:05 | |
*** Guest5598 is now known as rcastillo | 13:05 | |
*** ysandeep|afk is now known as ysandeep | 13:20 | |
fungi | looks like infra-prod-base may be failing again (most recent daily failed, and again just now on that ^ change): https://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-base&project=opendev/system-config | 13:36 |
fungi | "non-zero return code" from the ansible-playbook task, but /var/log/ansible/base.yaml.log doesn't show any failed tasks | 13:37 |
fungi | timestamps in that log correspond to the failed build too | 13:40 |
fungi | ahh, there's the reason... "fatal: [nb03.opendev.org]: UNREACHABLE!" | 13:41 |
fungi | i'll see if i can get it rebooted | 13:41 |
fungi | api says it's currently in "shutoff" state | 13:41 |
fungi | okay, `openstack server start` worked this time | 13:43 |
fungi | i'm able to log into it now | 13:43 |
fungi | i'll try to reenqueue the deploy and make sure it works | 13:43 |
gthiemonge | Hi folks, in this buildset: https://zuul.opendev.org/t/openstack/buildset/4c45db68c55b407ebb05f77dcf65729b all the jobs for wallaby and xena failed with mirror issues, is there a known issue? should I recheck? | 14:29 |
fungi | gthiemonge: that doesn't look like a mirror problem, since you got errors downloading the same packages through proxies in different parts of the world, indicating a recheck may not help unless the problem was transient and has already resolved itself | 14:49 |
gthiemonge | fungi: yeah you're right | 14:50 |
fungi | i'll see if i can find a corresponding error in one of the proxy logs to indicate what the problem might be on the pypi/fastly side | 14:51 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add job for publishing ansible collections https://review.opendev.org/c/openstack/project-config/+/850664 | 14:55 |
fungi | gthiemonge: oh! actually it looks like there may have been an earlier error i overlooked in the logs | 14:58 |
fungi | "Building wheel for openstack-requirements (setup.py): finished with status 'error'" | 14:58 |
fungi | "error: invalid command 'bdist_wheel'" | 14:59 |
fungi | https://zuul.opendev.org/t/openstack/build/373843dc47234df4b0fe35885803efb0/log/job-output.txt#5189-5200 | 15:00 |
fungi | so maybe it's started using a new setuptools which removed bdist_wheel? | 15:00 |
fungi | that's nothing to do with mirrors or pypi | 15:01 |
clarkb | fungi: bdist_wheel is supplied by the wheel package iirc | 15:02 |
clarkb | and it is supposed to fallback to using an sdist in that situation | 15:03 |
fungi | ahh, okay so maybe that one's benign | 15:03 |
fungi | so yeah, it does seem to be the truncated package downloads which are causing the job failure: https://zuul.opendev.org/t/openstack/build/373843dc47234df4b0fe35885803efb0/log/job-output.txt#6861-6866 | 15:05 |
fungi | Connection broken: InvalidChunkLength(got length b'', 0 bytes read) | 15:05 |
clarkb | whatever the issue was I'm able to downloda those pcakges now | 15:06 |
clarkb | some sort of cdn blip with pypi I guess | 15:06 |
fungi | 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /pypi/simple/grpcio/ | 15:06 |
fungi | found in the apache logs: | 15:07 |
fungi | [2022-07-21 10:43:51.503494] [substitute:error] [pid 2890472:tid 139668007925504] [client 2604:e100:1:0:f816:3eff:fece:9caa:41398] AH01328: Line too long, URI /pypi/simple/grpcio/, | 15:08 |
clarkb | note I'm able to download the pacakges through the our proxies too using the urls that are logged as being unhappy | 15:09 |
yoctozepto | I see you are already debugging the issue we have spotted in kolla CI too (InvalidChunkLength) | 15:12 |
fungi | yeah, i can pull up https://mirror.ca-ymq-1.vexxhost.opendev.org/pypi/simple/grpcio/ just fine | 15:12 |
fungi | it looks like it may have been a transient issue, i only see it logged on mirror.ca-ymq-1.vexxhost for several requests at 10:43:35 utc | 15:13 |
fungi | yoctozepto: are your timeframes similar? | 15:13 |
clarkb | one of the other jobs ran in ovh gra1 | 15:14 |
clarkb | whatever it was I suspect the cdn/pypi itself due to that | 15:14 |
fungi | and another in inmotion | 15:14 |
fungi | i'm checking all the apache logs to see if they correspond | 15:14 |
yoctozepto | the earliest is (UTC) 12:59, the latest is 15:00 | 15:14 |
yoctozepto | but the jobs likely failed before that | 15:14 |
yoctozepto | I'm reporting timestamps of Zuul comments | 15:15 |
fungi | on mirror.gra1.ovh i see a bunch of those between 10:53:37-13:46:47 utc | 15:15 |
clarkb | those can vary wildly form the actual errors and probably aren't very helpful | 15:15 |
clarkb | zuul reporting at 15:00 could've hit the error at 10:42 | 15:16 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add job for publishing ansible collections https://review.opendev.org/c/openstack/project-config/+/850664 | 15:16 |
fungi | the individual job completion times weill probably be fairly soon after the errors, but the gerrit commit can be significantly delayed, yes | 15:16 |
clarkb | in any case when I check the urls directly now they seem to work and considering the global spread I strongly suspect pypi or its cdn. https://status.python.org/ doesn't show currently issues though | 15:16 |
fungi | seen on mirror.mtl01.iweb 10:46:13-13:39:02 | 15:17 |
fungi | yeah, i suspect it cleared up around 1.5 hours ago, after being broken for ~2.5 hours | 15:18 |
yoctozepto | https://zuul.opendev.org/t/openstack/build/8b8b388d3e2d484b9c2215144b50d505 | 15:19 |
yoctozepto | Started at 2022-07-21 14:32:00 | 15:19 |
yoctozepto | Completed at 2022-07-21 14:43:38 | 15:19 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add job for publishing ansible collections https://review.opendev.org/c/openstack/project-config/+/850664 | 15:25 |
fungi | yoctozepto: which logfile should i be looking at to find the error? | 15:26 |
*** dviroel_ is now known as dviroel | 15:26 | |
*** rlandy is now known as rlandy|afk | 15:28 | |
kevko | fungi: for example this -> https://zuul.opendev.org/t/openstack/build/8b8b388d3e2d484b9c2215144b50d505/log/kolla/build/000_FAILED_openstack-base.log | 15:29 |
*** ysandeep is now known as ysandeep|out | 15:30 | |
fungi | kevko: thanks! so definitely the /pypi/simple/grpcio/ path again but at yet another mirror. i'll check the logs there | 15:33 |
fungi | no, nevermind, that was also mirror.mtl01.iweb | 15:34 |
clarkb | unfortunately no timestamps in that log? | 15:35 |
fungi | though zuul says that build started at 14:32:00 and ended at 14:43:38 so must be sometime in that 11 minute timespan | 15:36 |
clarkb | kevko: side note: you can use https against our mirrors now (we updated jobs by default to this when it was added but maybe your docker image stuff needs explicit instruction to do so) | 15:37 |
fungi | strangely, i don't see it in the access log even | 15:37 |
fungi | oh! that's why. i was looking at https | 15:38 |
fungi | there we go | 15:38 |
*** marios is now known as marios|out | 15:40 | |
fungi | /var/log/apache2/proxy_8080_error.log on mirror.mtl01.iweb mentions the problem between 14:40:39-15:21:44 | 15:40 |
clarkb | fungi: it is also curious that it seems to be that specific package? Like maybe something is up with its backing files | 15:46 |
clarkb | we could try a purge against those packages in case it is the cached data taht is sad | 15:47 |
yoctozepto | oh, https | 15:48 |
yoctozepto | we substitute http explicitly I guess | 15:48 |
fungi | clarkb: i'm unsure it's coming from the backing files, the error looks like: | 15:56 |
fungi | [2022-07-21 15:21:44.682230] [substitute:error] [pid 3820:tid 140633293432576] [client 198.72.124.35:42452] AH01328: Line too long, URI /pypi/simple/grpcio/, | 15:56 |
fungi | and it's logged in the proxy error log | 15:56 |
fungi | but maybe it caches that state? seems odd for it to cache an error condition. bad content sure | 15:56 |
clarkb | fungi: the internet says that mod substitute may be trying to substitute text in files that are too big | 15:57 |
clarkb | https://bz.apache.org/bugzilla/show_bug.cgi?id=56176 | 15:58 |
fungi | ahh, so we could have cached the bad content fastly served us, and then apache is erroring when trying to use the cached state | 15:58 |
clarkb | I guess that could explain why it is specific packages: their index or contents could be too large | 15:58 |
clarkb | ya maybe. Assuming the bad state is extremely large lines? | 15:58 |
fungi | so far it's only been the simple api index for grpcio as far as i've seen in logs | 15:58 |
clarkb | we substitute values on the indexes so that the urls for the file content point back at our mirrors iirc | 15:59 |
fungi | right | 15:59 |
clarkb | view-source:https://pypi.org/simple/grpcio/ all those urls get updated to our urls | 15:59 |
clarkb | and it is per line | 16:00 |
clarkb | there are some long lines but nothing over 1MB in there currently. I suspect that what we got back is html without line breaks for some reason | 16:00 |
clarkb | and then since grpcio has lots of releases that in aggregate is over 1MB? | 16:01 |
clarkb | we can increase the lenght limit. If it is serving the whole index without line breaks that would be about 1.4MB so maybe bumping the limit to 5MB as in that bug is reasonable? | 16:02 |
clarkb | But I suspect any indexes of that form would indicate some bug in the pypi serving process assuming that is what happens | 16:02 |
clarkb | I've got a dentist appointment soon. But if this continues to persist I think we can a) check our cached values for evidence of no line break indexes to file an issue against pypi and/or b) increase the limit to say 5MB | 16:04 |
fungi | yes, i agree it seems like pypi probably temporarily served up something that was semi-broken and had extremely long lines | 16:05 |
fungi | fwiw, i don't see any new occurrences in the mirror.mtl01.iweb proxy_8080_error.log after 15:21:44 utc | 16:05 |
clarkb | fungi: if you grep for one of the shasums in that index or similar in our apache cache I wonder if you can find the cached indexes that way then see if any lack line breaks | 16:06 |
yoctozepto | any idea if it's worky now? (i.e., may we issue rechecks?) | 16:36 |
fungi | yoctozepto: i see no new evidence of that error for over an hour now, so it's probably fine | 16:37 |
yoctozepto | fungi: thanks, will retry then | 16:38 |
fungi | though that was in the proxy_8080_error.log on mirror.mtl01.iweb | 16:38 |
fungi | in mirror_443_error.log on mirror.mtl01.iweb i see a newer burst which happened at 15:57:09-15:57:17 | 16:39 |
fungi | i'll check back on the other mirrors | 16:39 |
fungi | 16:02:58 in the mirror_443_error.log on mirror.gra1.ovh | 16:40 |
fungi | that's the most recent i can find | 16:41 |
fungi | less than 40 minutes ago, so it may still be going on | 16:41 |
kevko | it is same :( | 17:01 |
kevko | https://zuul.opendev.org/t/openstack/build/2929144b89da4415b504bdce8d721c52/log/kolla/build/000_FAILED_openstack-base.log | 17:01 |
kevko | fungi: ^^ | 17:01 |
fungi | and yet another region | 17:02 |
fungi | yeah, same error showing up in proxy_8080_error.log mirror.iad3.inmotion as recently as 16:58:55 utc | 17:02 |
fungi | first occurrence in there was at 12:39:57 utc | 17:04 |
TheJulia | diablo_rojo: oh hai! | 17:05 |
TheJulia | err | 17:05 |
fungi | oh! though i see some similar errors for different urls earlier on mirror.iad3.inmotion in mirror_443_error.log | 17:05 |
TheJulia | wrong window! | 17:05 |
fungi | it was apparently also impacting pymongo and moto | 17:06 |
fungi | at 02:38:40-02:38:48 and 02:39:14-02:39:16 respectively | 17:07 |
fungi | i'll see about overriding https://httpd.apache.org/docs/2.4/mod/mod_substitute.html#substitutemaxlinelength in our configs as a workaround | 17:09 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Override SubstituteMaxLineLength in PyPI proxies https://review.opendev.org/c/opendev/system-config/+/850677 | 17:15 |
fungi | clarkb: gthiemonge: yoctozepto: kevko: ^ | 17:16 |
fungi | one thing the three affected indices all have in common is that their sizes are near or over one megabyte. grpcio is the largest of the three with a current character count of 1411461 | 17:20 |
fungi | but moto is presently only 960403 so i'm wondering if the pypi/warehouse admins accidentally fubared the routing for the new json api and were returning it (which is understood to bloat the response size by somewhere around 1.5-2x) | 17:21 |
fungi | if so, once the workaround lands, we may see jobs breaking because of pip getting json when it expected html, or being too old to support the json api | 17:22 |
fungi | today's pip 22.2 release which turned on json api by default is a big hint for me that this could be related | 17:23 |
fungi | and pretty sure the json version is returned as one very long line | 17:24 |
Clark[m] | fungi: that lgtm but I'm still at the dentist if you want to self approve | 17:56 |
opendevreview | Merged opendev/system-config master: Override SubstituteMaxLineLength in PyPI proxies https://review.opendev.org/c/opendev/system-config/+/850677 | 18:39 |
kevko | fungi: let me rechec | 18:45 |
kevko | *recheck | 18:45 |
yoctozepto | it continues failing https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_095/850636/1/check/kolla-build-debian/095006d/kolla/build/000_FAILED_openstack-base.log | 19:04 |
* yoctozepto off | 19:04 | |
kevko | failing :( | 19:07 |
fungi | did that configuration deploy to the mirrors yet? | 19:09 |
kevko | how can we now ? :P | 19:09 |
fungi | kevko: yoctozepto: deploy jobs reported successful at 18:51 utc, so you may have rechecked too soon, but let me also confirm the configs are actually on the servers now | 19:10 |
Clark[m] | https://zuul.opendev.org/t/openstack/build/0614814401ee4fc4bf6db393dcbc62a2is the link. It's just lik any other zuul job | 19:12 |
Clark[m] | Will show up in the status page and is searchable etc | 19:12 |
fungi | looks like the configs updated around 18:48, looks at the fs | 19:12 |
fungi | i'm not sure the apache processes are actually reloaded with the new config though | 19:13 |
fungi | i still see apache processes with timestamps from 6 hours ago | 19:14 |
fungi | the linked failure was hitting mirror.mtl01.iweb at or shortly after the 18:53:03 timestamp at the top of its log, and the config was updated on that server at 18:48:28 according to a stat of the file | 19:18 |
fungi | so in theory the config was already installed | 19:18 |
fungi | parent apache2 process has a start time in january, but the worker processes i see on the server are from 17:54 and 18:57 today | 19:19 |
fungi | looking at /var/log/ansible/service-mirror.yaml.log on bridge, i don't immediately see that we actually reload/restart the apache2 service when its config gets updated | 19:23 |
Clark[m] | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/handlers/main.yaml is the handler that should do it iirc | 19:24 |
Clark[m] | But Ansible handlers are always iffy | 19:24 |
Clark[m] | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/tasks/main.yaml#L140 that doesn't actually notify though only the a2ensite does which is a noop? | 19:25 |
Clark[m] | That may explain it | 19:25 |
fungi | yeah, i'll push up a patch | 19:26 |
Clark[m] | We may want to reload when config updates and restart when it is written fresh? Maybe that is why it is done that way to avoid restarting when already up? | 19:26 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Notify apache2 reload on updates to mirror vhost https://review.opendev.org/c/opendev/system-config/+/850686 | 19:29 |
fungi | Clark[m]: ^ like that? | 19:29 |
Clark[m] | Yes but that will restart Apache which may cause jobs to fail | 19:30 |
Clark[m] | I think we may want a separate notify to reload instead? But I don't know how to reconcile that against when we actually do need to restart | 19:31 |
fungi | ahh, yeah for the mailman playbook we have a reload handler | 19:31 |
fungi | in progress | 19:31 |
Clark[m] | But it may just work to send both notifies and then have systemd sort it out | 19:31 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Notify apache2 reload on updates to mirror vhost https://review.opendev.org/c/opendev/system-config/+/850686 | 19:33 |
fungi | i'm good with the simple approach. we can always optimize somehow if we find that it causes a problem | 19:33 |
kevko | fungi: nope, still there is something bad in CI ..when building same image locally ...it's working | 19:33 |
fungi | kevko: yeah, it's not "in ci" but rather the proxies have bad data cached from pypi and are still tripping over it. right now the updated configuration to allow them to serve that probably-bad data is still not applied because the apache services were never told to reload the vhost config updates | 19:35 |
fungi | 850686 should address that in the future, but for now i need to script up something to reload apache2 on all the mirrors | 19:36 |
fungi | sudo ansible mirror -m shell -a 'systemctl reload apache2' | 19:37 |
fungi | that does what i think it does, right? | 19:37 |
fungi | guess i'll find out | 19:39 |
fungi | now i see new process timestamps on the apache workers on mirror.mtl01.iweb so the others are presumably the same | 19:40 |
Clark[m] | Ya I think something like that will work | 19:41 |
fungi | kevko: have another try. it will likely still fail but we'll hopefully get a different (more useful) error this time | 19:41 |
Clark[m] | And ya if pypi is serving not well formatted data that's not specific to us. It's just we trip on it at the proxy rather than the client | 19:41 |
clarkb | I won't fast approve 850686 since you've manually done the release now (and landing that won't trigger reloads as the config isn't changing) | 19:54 |
fungi | i'm increasingly suspicious we're ending up with the json index rather than the html one | 19:54 |
clarkb | hopefully someone else can do a second review of it and make sure we aren't missing some important piece | 19:55 |
clarkb | fungi: does that imply pypi changed teh simple api? | 19:55 |
clarkb | I really hope not as it will potentially make our caching proxy setup more difficult to maintain | 19:55 |
fungi | clarkb: no, the design is that warehouse is supposed to route the requests to the json api vs simple api depending on what content type gets requested | 19:55 |
fungi | but i feel like maybe they botched something | 19:56 |
clarkb | fungi: or maybe new pip is requesting json | 19:56 |
clarkb | I think that is still a big regression for caching | 19:56 |
clarkb | really what I'm trying to say is that any switch to json is going to be problematic for those of us that try to be good citiziens | 19:57 |
fungi | yeah, if the jobs in question are ending up with pip 22.2 maybe, that could explain it | 19:58 |
clarkb | `curl -H 'Accept: application/json' https://pypi.org/simple/setuptools/` gets me html at least | 19:59 |
fungi | missing from the changelog but mentioned in the release announcemen here: https://discuss.python.org/t/announcement-pip-22-2-release/17543 | 19:59 |
fungi | though that leaves me wondering if mod_proxy can differentiate those when caching | 19:59 |
clarkb | unfortunately that pep doesn't list "make caching easier" as a goal | 20:00 |
clarkb | fungi: yes it should | 20:00 |
fungi | because it's the same url just with different responses depending on the accept header | 20:00 |
clarkb | it is header and content type aware | 20:01 |
clarkb | curl -H 'Accept: application/vnd.pypi.simple.v1+json' https://pypi.org/simple/setuptools/ that is how you get the json | 20:01 |
clarkb | and ya I bet the issue is new pip is used and getting large json docs back | 20:01 |
clarkb | in which case your fix to increase the substitute line length is correct. Maybe we should ask them to add line breaks in the json doc | 20:01 |
clarkb | grpcio's json is ~1.6MB so 5MB is plenty there but unlike line broken html this will grow until the end of time | 20:03 |
fungi | `curl -H 'Accept: application/vnd.pypi.simple.v1+json' https://mirror.mtl01.iweb.opendev.org/pypi/simple/bindep/` does return json for me, so yeah this may be kolla's jobs are suddenly using pip 22.2 | 20:04 |
clarkb | ya and tehre are no line beraks | 20:05 |
fungi | and yes, the same for grpcio's index returns 1524169 characters of json | 20:05 |
clarkb | so I think your change is a good one going forward but we'll have to bump that to 10m and then 20m and so on potentially for specific packages. We should maybe ask them to add some line breaks | 20:05 |
clarkb | or we can just unleash the bots at their cdn directly <_< | 20:06 |
clarkb | fungi: the root index is 10MB large and also includes no line breaks | 20:07 |
clarkb | so ya 5m may already be insufficient | 20:08 |
clarkb | :/ | 20:08 |
clarkb | *9.3MB | 20:08 |
clarkb | however we don't need to do substitution on the root index so maybe we don't have this problem | 20:08 |
clarkb | we apply the substitution to everything under /pypi which includes the root index | 20:09 |
clarkb | so ya | 20:09 |
clarkb | I think we should bump the limit to 20MB in the apache config. Ask upstream if they can add an occasionaly line break in the json. And fallback to direct access if those actions don't alleviate the problem | 20:11 |
fungi | i replied to the release announcement topic | 20:14 |
clarkb | thanks | 20:15 |
fungi | though any request to alter how the responses are delivered probably warrants an issue filed for warehouse | 20:16 |
clarkb | you can add line breaks between entries in json without altering the semantics or validity of json. But ya probably best to track there | 20:18 |
clarkb | fungi: should I push a change to bump the limit to 20mb or do you want to do that? And I guess we can stack it on top of the reload change and land both together to see if that works as expected | 20:20 |
fungi | https://github.com/pypi/warehouse/issues/11919 | 20:24 |
fungi | sure, i can push that | 20:25 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Increase PyPI substitute line length limit to 20m https://review.opendev.org/c/opendev/system-config/+/850688 | 20:30 |
fungi | clarkb: ^ | 20:30 |
clarkb | fungi: note the root index is 9.3 MB which is the largest I've found | 20:32 |
clarkb | and we substitute it too as far as I can tell | 20:32 |
fungi | oh, but i guess pip isn't actually hitting it | 20:33 |
clarkb | ya it may not be | 20:33 |
fungi | or else we'd see errors sooner (and errors in the apache log) | 20:33 |
clarkb | or we don't have any errors because it doesn't contain strings to substitute (I don't knwo if apache will just always fail or only if it needs to make substitutions) | 20:33 |
clarkb | considering we're not seeing errors with the index I think thosetwo chagnes are less urgent and I don't need to approve them now? | 20:34 |
fungi | agreed | 20:34 |
fungi | clarkb: gthiemonge: yoctozepto: kevko: so in summary, it looks like the affected jobs probably used pip 22.2 (released earlier today), which defaults to requesting pypi's json simple api rather than its traditional html simple api, and the json responses are all on one line which for some projects exceeds the default limit apache mod_substitute will process. we think we have a working solution | 20:48 |
fungi | in place now | 20:48 |
fungi | separately, i've submitted a feature request for warehouse (the software behind the pypi.org site) to request they insert an occasional linebreak so as to be nicer to downstream proxies: https://github.com/pypi/warehouse/issues/11919 | 20:49 |
clarkb | thank you for running that down. A weird one for sure | 20:49 |
fungi | system-config-run-mirror-x86 failed on that last change | 21:02 |
fungi | i'll take a closer look after dinner | 21:02 |
*** dasm|ruck is now known as dasm|off | 21:04 | |
*** dviroel is now known as dviroel|out | 21:07 | |
TheJulia | so a few times today I've seen foundation/list emails get flagged as either spam or phishing attempts... Did something change? | 21:07 |
clarkb | TheJulia: not that I know of. Same server and same ips are hosting the lists | 21:09 |
clarkb | maybe your mail system is getting crankier about dkim/dmarc and someone is sending with signed messages | 21:09 |
clarkb | on the openstack-discuss list we pass the email through without modifying things that re typically signed so that the signatures continue to verify. But we left that up to each list to configure as it changes the behavior of the email slightly iirc | 21:10 |
TheJulia | it is gmail, and it had a neutral spf i.e. not explicitly flaged as permitted, that being lists.openstack.org issuing the helo statement to google's mx | 21:10 |
clarkb | I suppose one thing that did cahnge is fungi moved the location of that list under openinfra.dev. My gmail copy seems to be complaining that it cannot verify that domain sent the email | 21:11 |
clarkb | whereas before the list was under openstack.org | 21:12 |
fungi | TheJulia: one possible trigger i've seen keep cropping up is that some list owners are using e-mail addresses which automatically classify messages from various sources, and so the moderation queue notifications containing samples of all the spam those list addresses receive from non-subscribers which are held get added to the classification pool for the listserv | 21:13 |
clarkb | heh if you open the little ? in the gmail interface they explicitly say this is a common thing for mailing lists | 21:13 |
clarkb | but ya they specifically want spf and dkim | 21:13 |
clarkb | and make a note that mailing lists often don't do this | 21:14 |
TheJulia | well, a nice remarkable increase which will impact trust/use of the mailing lists | 21:16 |
clarkb | right so, one way to improve that would be for ildikov to dkim sign email and then have the list pass it through as is. Another is to add spf records to openinfra.dev | 21:17 |
clarkb | though getting the spf records wrong may make the mailing list problem worse? | 21:18 |
TheJulia | openinfra.dev should have spf as a minimum bar | 21:18 |
clarkb | TheJulia: yes, I can bring that up with people who manage dns and mail for that domain (it isn't us) | 21:18 |
TheJulia | honestly... I've seen whole orgs just trash can not positively listed iwth spf | 21:18 |
TheJulia | clarkb: thanks | 21:18 |
johnsom | Anyone else having slow "git review" times? It sits for a while, then prints the "Creating a git remote called", then sits for a longer while and eventually the patch posts. | 21:19 |
clarkb | johnsom: its doing setup if it is creating remotes for you | 21:20 |
clarkb | when it does that it checks a number of thigns to make sure it sets up a working repo. | 21:20 |
johnsom | Yeah, that is a "normal" message for my workflow | 21:20 |
clarkb | it should only do that the first time you run it in a repo and not every time. Unless you are deleting remotes or something | 21:20 |
johnsom | It usually just takes seconds to post a patch, but today it is over a minute it seems | 21:21 |
clarkb | basically if it is doing setup that is expected to take longer. If you are getting setup done each time you'll need to look into why the git remote state isn't persisting in your repo which causes it to happen every time | 21:21 |
johnsom | I have seen this with storyboard issues before, but this repo is on launchpad | 21:21 |
clarkb | git review talk to gerrit not launchpad or storyboard | 21:21 |
clarkb | but also that looks like it is configuring the repo for use which is expected to take longer. Other things that can take time are performing the rebase test potentially | 21:22 |
clarkb | but I would look into why it is creating a git remote for you first | 21:22 |
johnsom | There is some link in this process that calls out to storyboard. We have had that issue before. The part where it adds the comment that a patch has been submitted. | 21:22 |
johnsom | That message is 100% normal, it's a fresh clone. | 21:23 |
clarkb | I didn't realize that was inline with pushing though | 21:23 |
johnsom | But it is usually much faster. | 21:23 |
clarkb | johnsom: ok I guess I'm confused why you called it out. The implication is that it was happenign every time you psuh and I was saying that if that happens every time you push then that is why it is slow :) | 21:24 |
clarkb | if you push a second time does it tell you it is configuring a remote? If not is it still slow? | 21:24 |
clarkb | (it will always be slower when configuring the remote) | 21:24 |
johnsom | Just background on the point at which git review stalls, after that message | 21:24 |
clarkb | git review does accept verbosity flags to help show you where it is stalling. That would be the next thing I would look at | 21:25 |
clarkb | As far as dmarc/dkim go we'll want to update the board mailing list to pass through most of the message untouched so that signed emails validate on the other end once recieved. We can do that for the board list (it is what we did for the openstack-discuss list and others) just want to get that done before sending signed emails | 21:27 |
fungi | i have a heavy concern for all dmarc enforcement (both spf and dkim) insofar as they effectively partition users of traditional e-mail from those who have hitched their wagons to bulk freemail providers. i disagree with the assertion that "lists.openinfra.dev should have spf as a minimum bar" (whatever else openfra.dev does), though am open to adding it if there's sufficient interest from its | 21:27 |
fungi | constituency | 21:27 |
clarkb | fungi: fwiw I think the issue was openinfra.dev not lists.openinfra.dev | 21:27 |
clarkb | we have never had spf records for the list servers as far as I can tell | 21:28 |
fungi | well, lists.openinfra.dev is likely to break dmarc bits for anyone posting messages there | 21:28 |
clarkb | basically ildikov sent an email and whether or not it was sent directly to TheJulia or via the mail server google would've complained due to lack of info | 21:28 |
clarkb | fungi: yes we have to do the same pass through taht was applied to openstack-discuss I bet | 21:29 |
johnsom | clarkb Ok, so it's doing the push --dry-run to review.opendev.org which resolves to 2604:e100:1:0:f816:3eff:fe52:22de which is not responding | 21:29 |
fungi | ahh, yes, openinfra.dev does not publish an spf record the way openstack.org does. that may have been overlooked by the folks who set up the domain | 21:30 |
johnsom | clarkb Ok, it's on my end. IPv6 isn't working. Comcast did "maintenance" last night here, so probably broke something. | 21:30 |
clarkb | johnsom: ok so likely it is waiting for ipv6 to fail then trying ipv4 and proceeding | 21:30 |
johnsom | Yep | 21:31 |
TheJulia | fungi: yeah, that would do it. I've got a direct email from allison which also got flagged. :( | 21:31 |
clarkb | fungi: re your concern it is very interesting to compare my fastmail and gmail versions of the same email | 21:31 |
clarkb | fastmail is basically "the content of the message doesn't look like spam so whatever" and gmail is all "you didn't play by the rules so we'll give you a giant orange banner!" | 21:32 |
TheJulia | yeah | 21:33 |
clarkb | fwiw I also have concern because I've recently received emails from reputable companies that passed spf, dkim, and dmarc that were almost certainly spam/phishing (maybe via a compromised email server?) and I expect many put far too much weight on gmail giving the all clear without vlaidating the contents of the email | 21:33 |
clarkb | I'm still waiting to hear back from said company on whether or not they got owned | 21:33 |
fungi | yes, all of dmarc is basically a coalition of bulk freemail providers to make the messages between them have a higher delivery confidence while reducing their own workloads | 21:33 |
fungi | it's nothing to do with actually reinforcing the legitimacy of deliveries, and entirely about entrenching market share for their respective businesses | 21:34 |
fungi | making it harder or nearly impossible for small/individual senders to comply with their increasingly complicated "standards" is all part of the design | 21:35 |
JayF | Email deliverability used to be a lot more friendly for senders. You used to be able to get feedback loops from major ESPs, where they'd let you know when a person reported you as spam (so you can remove the sender from your list and know that someone abused your service) | 21:35 |
TheJulia | JayF: Some days I miss those days | 21:42 |
fungi | i still live in those days, i just don't consider gmail to be e-mail, and prioritize communication with people who aren't trapped in that dimension | 21:43 |
JayF | I mean, I can tell you with certainty that all those ESPs dropped their feedback loop program literally a decade ago | 21:43 |
JayF | including AOL/Yahoo | 21:44 |
JayF | and microsoft never did one | 21:44 |
TheJulia | I gave up running my own mail server because of the likes of google sinking my emails into /dev/null | 21:44 |
fungi | if you really want e-mail, then you won't be using gmail. if you're using gmail, you have actively chosen to communicate only with people who also buy into that paradigm | 21:44 |
* JayF did 2.5 years at an "email marketing" firm for his first linux job | 21:44 | |
TheJulia | JayF: fwiw, I was referring to simpler times back when we first met, not that job... omg not that job | 21:45 |
TheJulia | simpler technology :) | 21:45 |
JayF | TheJulia: wait, did you know me when I worked at iContact? | 21:45 |
TheJulia | JayF: yes! | 21:45 |
JayF | TheJulia: I feel like those two timelines never converged in my head | 21:45 |
JayF | oh yeah, of course, because I left there when I left NC | 21:45 |
* fungi is still ostensibly in nc | 21:46 | |
* TheJulia gets out the "its a small world" music | 21:46 | |
JayF | TheJulia: you wanna feel old and young simultaneously? We're getting within like, what, 5ish years of having known each other for half our lives? | 21:46 |
TheJulia | oh my | 21:46 |
fungi | returning to the earlier topic, "ERROR! The requested handler 'reload apache2' was not found in either the main handlers list nor in the listening handlers list" | 21:49 |
fungi | did i miss something in 850688 to make that accessible to the task? | 21:50 |
clarkb | fungi: its called 'mailman reload apache2' in your handler update | 21:51 |
clarkb | just copy paste fail | 21:51 |
fungi | d'oh, yep! | 21:51 |
fungi | thanks, it's clearly getting late here, time for a sake | 21:52 |
fungi | i missed that it also failed on 850686 | 21:52 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Notify apache2 reload on updates to mirror vhost https://review.opendev.org/c/opendev/system-config/+/850686 | 21:53 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Increase PyPI substitute line length limit to 20m https://review.opendev.org/c/opendev/system-config/+/850688 | 21:53 |
* fungi sighs loudly at nothing in particular | 21:53 | |
clarkb | I feel like I said I would review something for someone but then didn't. But I reviewed the ca cert stack from ianw and some zuul changes for zuul fokls. Anything important I'm forgetting? | 23:10 |
clarkb | fungi: ianw also if you have time for https://review.opendev.org/c/opendev/system-config/+/850580 that came out of our meeting this week. The idea we can check for new caches in our gerrit upgrade job | 23:10 |
fungi | thanks, hopefully i can take a look in a bit. i also meant to look at the ca cert stack | 23:12 |
clarkb | No real rush behind that. More just wanting to clear it off my todo list | 23:15 |
clarkb | oh the grafana stack I had it in my todo list up a bit | 23:16 |
fungi | getting things off of everyone's respective todo lists is still a priority for me | 23:21 |
ianw | clarkb: thanks ... it still has -1 because of the registry issues, but https://review.opendev.org/q/topic:console-version-tags should get zuul to remove the console log streaming files. it was a bit harder than i first thought but i think a useful addition | 23:21 |
clarkb | ya that is probably a better fix then trying to ensure we run a cleanup system everywhere we need it | 23:21 |
ianw | yeah, although it's quick for us to add, in general i think it's better than having to have zuul explain why you need to do that, and leaving it up to the admins to figure out | 23:23 |
clarkb | ok I've got a todo list for tomorrow. I'm going to do my best to dig into those reviews | 23:23 |
fungi | i harbor no illusions my todo list will shrink tomorrow, but i do intend to try and thumb my nose at the universe anyhoo | 23:35 |
clarkb | well this way I won't be forgetting what I need to do. Whether or not the list shrinks is another story :) | 23:38 |
ianw | symmetric_difference is a cool one | 23:47 |
fungi | i'll consider that for my next band name | 23:51 |
fungi | with the underscore, of course | 23:51 |
ianw | oh https://review.opendev.org/c/opendev/system-config/+/850123 is another minor one, that sets the timestamp of the stored productino ansible log files their start time, so when you look it roughly lines up with the start time of the zuul job | 23:51 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!