@jim:acmegating.com | opendev is still running master with 2 schedulers. so far it's still looking good and i'm inclined to leave it running in that configuration into the workday tomorrow. | 01:07 |
---|---|---|
-@gerrit:opendev.org- Felix Edel proposed: | 07:37 | |
- [zuul/zuul] 814996: Make the ConfigLoader work independently of the Scheduler https://review.opendev.org/c/zuul/zuul/+/814996 | ||
- [zuul/zuul] 816361: Load system config and tenant layouts in zuul-web https://review.opendev.org/c/zuul/zuul/+/816361 | ||
- [zuul/zuul] 816362: Implement job freezing API in zuul-web https://review.opendev.org/c/zuul/zuul/+/816362 | ||
- [zuul/zuul] 816514: Implement managenet events directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/816514 | ||
- [zuul/zuul] 816783: Implement autohold endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/816783 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 816972: Use election for dispatching timer events https://review.opendev.org/c/zuul/zuul/+/816972 | 09:39 | |
-@gerrit:opendev.org- Simon Westphahl proposed: | 09:40 | |
- [zuul/zuul] 815787: Refresh pipelines in tests when settled https://review.opendev.org/c/zuul/zuul/+/815787 | ||
- [zuul/zuul] 815278: DNM: execute tests with two schedulers https://review.opendev.org/c/zuul/zuul/+/815278 | ||
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 816993: Remove pf-c-content CSS class from status page https://review.opendev.org/c/zuul/zuul/+/816993 | 11:53 | |
@felixedel:matrix.org | corvus: ^ A small UI improvement for the headings on the status page. | 11:54 |
@felixedel:matrix.org | corvus mhu The stack around https://review.opendev.org/c/zuul/zuul/+/814711/1 also includes some small UI fixes for the buildset result page. Would be cool if somebody could have a look at those :) | 11:58 |
@felixedel:matrix.org | * corvus mhu The stack around https://review.opendev.org/c/zuul/zuul/+/814711/1 also includes some small UI fixes for the buildset result page. Would be great if somebody could have a look at those :) | 12:08 |
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of Felix Edel: [zuul/zuul] 816807: Split up registerScheduler() and onLoad() methods https://review.opendev.org/c/zuul/zuul/+/816807 | 12:41 | |
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 816993: Remove pf-c-content CSS class from status page https://review.opendev.org/c/zuul/zuul/+/816993 | 13:29 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 768115: Web UI: allow a privileged user to request autohold https://review.opendev.org/c/zuul/zuul/+/768115 | 13:30 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 768199: Web UI: add Autoholds, Autohold page https://review.opendev.org/c/zuul/zuul/+/768199 | 13:31 | |
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of Felix Edel: | 13:42 | |
- [zuul/zuul] 816807: Split up registerScheduler() and onLoad() methods https://review.opendev.org/c/zuul/zuul/+/816807 | ||
- [zuul/zuul] 814996: Make the ConfigLoader work independently of the Scheduler https://review.opendev.org/c/zuul/zuul/+/814996 | ||
- [zuul/zuul] 816361: Load system config and tenant layouts in zuul-web https://review.opendev.org/c/zuul/zuul/+/816361 | ||
- [zuul/zuul] 816362: Implement job freezing API in zuul-web https://review.opendev.org/c/zuul/zuul/+/816362 | ||
- [zuul/zuul] 816514: Implement managenet events directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/816514 | ||
- [zuul/zuul] 816783: Implement autohold endpoints directly in zuul-web https://review.opendev.org/c/zuul/zuul/+/816783 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: | 13:42 | |
- [zuul/zuul] 817003: Store pipeline status for zuul-web in Zookeeper https://review.opendev.org/c/zuul/zuul/+/817003 | ||
- [zuul/zuul] 817004: Use pipeline status from Zookeeper in zuul-web https://review.opendev.org/c/zuul/zuul/+/817004 | ||
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 810699: Web UI: Show pipeline types as icons https://review.opendev.org/c/zuul/zuul/+/810699 | 13:57 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 781858: web UI: allow a privileged user to promote a change https://review.opendev.org/c/zuul/zuul/+/781858 | 14:01 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 802559: Web UI: Add "Create Autohold Request" form, improve API error messages https://review.opendev.org/c/zuul/zuul/+/802559 | 14:30 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 769943: Example Docker compose: keycloak integration https://review.opendev.org/c/zuul/zuul/+/769943 | 14:30 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 816208: WIP Use AuthProvider https://review.opendev.org/c/zuul/zuul/+/816208 | 14:31 | |
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 817035: Fix race condition when updating node requests https://review.opendev.org/c/zuul/zuul/+/817035 | 15:12 | |
@shrews:matrix.org | Is it possible to control the commit message format for a Github project gated by Zuul? The format seems include: PR title, 1st commit message, PR description, reviewers. It would be nice if we could trim that down to NOT include the PR description or reviewers. | 15:22 |
@fungicide:matrix.org | when you say "the commit message format" do you mean on the merge commit created by zuul merging the change to the public branch in github? | 15:46 |
@shrews:matrix.org | fungi: yes | 15:46 |
@fungicide:matrix.org | thanks, to be clear i don't know the answer to your question, just making sure i understood what you were asking | 15:47 |
@clarkb:matrix.org | I think GitHub does the merging not zuul. Would be a GitHub setting if it exists | 16:07 |
@fungicide:matrix.org | though maybe another argument in favor of having zuul push merge commits | 16:19 |
@clarkb:matrix.org | Are the bigger zuul status queue card entries expected? | 16:46 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 817056: Refresh pipelines before checking for leaked node requests https://review.opendev.org/c/zuul/zuul/+/817056 | 17:04 | |
@jim:acmegating.com | Clark: you can fix the bigger card size with a +3 on https://review.opendev.org/816094 | 17:04 |
@jim:acmegating.com | Clark, fungi, tobiash, swest, felixedel: i believe https://review.opendev.org/817056 is causing some jobs in opendev's zuul to wait indefinitely for nodes | 17:05 |
@jim:acmegating.com | the case i examined had zuul logging that it deleted a leaked node request, but it wasn't leaked; i think that's why | 17:06 |
@jim:acmegating.com | i'd like to merge that and then try a zero-downtime restart in opendev to roll out that fix | 17:07 |
@clarkb:matrix.org | +2 | 17:09 |
@clarkb:matrix.org | I've also approved the card sizing in the web ui change after verifying it looks more how I expect in the site preview | 17:09 |
@tobias.henkel:matrix.org | Yay, zero downtime restarts | 17:11 |
@jim:acmegating.com | it's going to be fun, no matter what happens! :) | 17:13 |
@westphahl:matrix.org | corvus: we found another case that can lead to jobs waiting indefinitely. Should be fixed by 817035 | 17:25 |
@jim:acmegating.com | swest: thanks! Clark, tobiash: ^ want to get that one in too? | 17:27 |
@clarkb:matrix.org | yup looking | 17:28 |
@clarkb:matrix.org | do we want to recheck it? not sure how worried I should be about the failed unittest job | 17:28 |
@jim:acmegating.com | i took a quick look and seriously doubt it's related. i think just going into gate is ok. | 17:29 |
@clarkb:matrix.org | Does the docstring on the method there need to be updated? it says the lock attribute will be updated whcih I guess is true for the local system but not zk? | 17:29 |
@clarkb:matrix.org | oh wait updateNodeRequest is updating it from zk into memory | 17:30 |
@goneri:matrix.org | Is there a way to manually ask nodepool to start a nodeset on a given provider? I'm adding some new images and I would like validate all the combination are working. | 17:33 |
@jim:acmegating.com | Gonéri: you can give them a special label and request that in your test job | 17:34 |
@goneri:matrix.org | This is what I do, but it's also a lot of work. | 17:36 |
@tristanc_:matrix.org | Gonéri: you can also set a min-ready attribute to 1 | 17:36 |
@goneri:matrix.org | oh, good idea. I will try that. | 17:37 |
@goneri:matrix.org | Actually, with min-ready, I still need one label by provider. | 17:39 |
@clarkb:matrix.org | What I tend to do these days is tell nodepool to build an upload imges then I can manually boot them easily | 17:49 |
@clarkb:matrix.org | When I do similar stuff for OpenDev you can assume reasonably consistent performance across all images so what you need ot check is that all images boot and then you can pick a single image to do some representative testing which reduces the overhead | 17:50 |
@goneri:matrix.org | We've got a lot of inconsistencies (boot-from-volume, different flavors, AWS AMI ID, etc) between our providers and we often discover the problems weeks later. | 17:57 |
@clarkb:matrix.org | ah I see new image across all providers rather than new provider with a bunch of images | 18:00 |
@clarkb:matrix.org | Fedora 34 was recently problematic for us in that way as they broke their kernels for VMs. | 18:01 |
@clarkb:matrix.org | then fixed f35 first and left f34 broken for an extended period | 18:02 |
@clarkb:matrix.org | corvus: I think I'm seeing the status page flap. What is interesting to me is that we seem to have done a gate reset in zuul gate but I'm not sure there was ever a fourth change ahead? | 18:19 |
@clarkb:matrix.org | is it possible that we're doing inappropriate resets due to job retries? | 18:19 |
@jim:acmegating.com | yeah, i'm looking at the scheduler logs, and i think the schedulers are actually in disagreement on that | 18:21 |
@jim:acmegating.com | they're actually canceling jobs and restarting | 18:21 |
@jim:acmegating.com | i think the bug is with the "RETRY" job state | 18:28 |
@jim:acmegating.com | hasAnyJobFailed considers a 'retry' to be a failure | 18:28 |
@clarkb:matrix.org | ah and it should only do so if the retry limit has been reached | 18:28 |
@clarkb:matrix.org | FWIW this seem to be affecting other tenant's gate queues in the opendev zuul. I suspect the retries were due to the zk issues so not an ongoing problem? | 18:29 |
@jim:acmegating.com | yeah, i think this is another thing that can be fixed live with dequeue/enqueue | 18:30 |
@jim:acmegating.com | Clark: feel free to do that for openstack, but please leave the one in zuul/gate for a little longer while i continue to look | 18:30 |
@clarkb:matrix.org | corvus: can do. Just dequeue enqueue the change's with retry state jobs? | 18:30 |
@jim:acmegating.com | yep | 18:31 |
@clarkb:matrix.org | This is not completing very quickly. Just noting that if it isn't expected | 18:33 |
@clarkb:matrix.org | but it does eventually finish | 18:33 |
@jim:acmegating.com | yes it's very slow | 18:35 |
@jim:acmegating.com | the 2 schedulers have 2 different znode versions of the builds | 18:37 |
@jim:acmegating.com | and that's because they have different buildset znode versions | 18:40 |
@jim:acmegating.com | and different item versions | 18:43 |
@jim:acmegating.com | okay, checking over time, they're both changing; so i suspect it's less that the state is out of sync, and more that they're just fighting each other | 18:46 |
@clarkb:matrix.org | I'm realizing that most of the openstack check queue (and probably other queues) need to be refreshed too | 18:47 |
@clarkb:matrix.org | noticed it in gate because of the resets being easy to spot but it happens in the other queues too preventing check jobs from compeleting for a single change | 18:47 |
@clarkb:matrix.org | also this is a bit mind bending to try and make sense of. There is once change that shows up in one scheduler but not the other. Neither appears to have reported it to gerrit | 18:49 |
@clarkb:matrix.org | oh wait maybe they are at the end of the status page for one and not the other. More trippy :) | 18:50 |
@clarkb:matrix.org | ok I'm going to dequeue and reenqueue 54 check entries and hope that helps me make more sense of things | 18:54 |
@clarkb:matrix.org | they are all tenant openstack so shouldn't affect the zuul stuff you asked me to leave alone | 18:55 |
@jim:acmegating.com | okay, i think i see the issue; we don't check if the the build path has changed when we deserialize builds; so we're refreshing existing objects from 2 different paths | 19:00 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 817067: Verify build path before refreshing from ZK https://review.opendev.org/c/zuul/zuul/+/817067 | 19:05 | |
@jim:acmegating.com | Clark, fungi, tobiash, swest: ^ i think that's the issue we're seeing with retry builds | 19:05 |
@jim:acmegating.com | i think it makes sense we would only see that with retry builds (where we create new builds), and also that we would only see that with a second scheduler | 19:06 |
@jim:acmegating.com | Clark: i may be able to monkeypatch that in using the repl | 19:07 |
@spamaps:spamaps.ems.host | So I have definitely found infinite recursion in the gce provider config code | 19:07 |
@spamaps:spamaps.ems.host | I have a very simple reproducer that is a very normal config. | 19:07 |
@clarkb:matrix.org | corvus: I'm observing new behavior from the periodic pipeline showing many neutron 000000 entries after I enqueued a single one | 19:07 |
@spamaps:spamaps.ems.host | Working through how to solve it now, but wondering why others haven't hit it. | 19:07 |
@jim:acmegating.com | spamaps: you are probably the second person to use that driver | 19:08 |
@clarkb:matrix.org | corvus: if you can take a look at cleaning up the periodic queue because I'm confused that would be good (but I know you are looking at fixing the underlying issue) | 19:08 |
@jim:acmegating.com | Clark: we know periodic enqueues are weird with 0000 refs; can we just dequeue them and ignore that for now? | 19:09 |
@clarkb:matrix.org | corvus: yes. How do I dequeue the 00000 ref? use --ref 00000 ? | 19:09 |
@jim:acmegating.com | i think that should work | 19:10 |
@clarkb:matrix.org | ok I'll try it | 19:10 |
@spamaps:spamaps.ems.host | > <@jim:acmegating.com> spamaps: you are probably the second person to use that driver | 19:10 |
Well then there's my answer. :) | ||
@spamaps:spamaps.ems.host | There's already an attempt to avoid the recursion | 19:12 |
@spamaps:spamaps.ems.host | I think it may have been too narrow,. | 19:12 |
@spamaps:spamaps.ems.host | Ultimately what happens is, a pool has a provider has a pool has a provider... | 19:13 |
@spamaps:spamaps.ems.host | ah so simple | 19:14 |
@spamaps:spamaps.ems.host | ignore_equality should have ['provider'] but is [] | 19:14 |
@clarkb:matrix.org | corvus: "Exception: Unable to find shared change queue for openstack/neutron:0000000000000000000000000000000000000000" fwiw. I also tried the seven digit 0000000 string | 19:15 |
@spamaps:spamaps.ems.host | Yay, that did it. | 19:18 |
@spamaps:spamaps.ems.host | Damn, do I have to get Spotify to sign the openstack CLA? | 19:19 |
@jim:acmegating.com | spamaps: no cla required for zuul | 19:19 |
@spamaps:spamaps.ems.host | \o/ | 19:19 |
@jim:acmegating.com | because of exactly that | 19:20 |
@jim:acmegating.com | Clark: i have monkeypatched the fix; that might abate the need for re-enqueing retry stuff | 19:21 |
@jim:acmegating.com | Clark: it looks like the situation in the zuul gate queue is resolved | 19:21 |
@clarkb:matrix.org | corvus: ok I can ^C my script and then only enqueue what was dequeued? | 19:22 |
@jim:acmegating.com | Clark: yeah i think so | 19:22 |
@clarkb:matrix.org | alright doing that now | 19:22 |
@jim:acmegating.com | Clark: then if you could +3 https://review.opendev.org/817067 that'd be great since we're running that in prod :) | 19:22 |
@clarkb:matrix.org | corvus: do you have time to look at dequeuing the 000000 periodic changes? I'm not sure what I should try next there | 19:23 |
@jim:acmegating.com | Clark: can do | 19:24 |
@clarkb:matrix.org | thanks | 19:24 |
-@gerrit:opendev.org- Clint Byrum proposed: [zuul/nodepool] 817070: Fix infinite recursion in GCE provider https://review.opendev.org/c/zuul/nodepool/+/817070 | 19:37 | |
@spamaps:spamaps.ems.host | Fun.. now I get to figure out how to make my instances accessible by executor/nodepool when this is happening: | 19:51 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817056: Refresh pipelines before checking for leaked node requests https://review.opendev.org/c/zuul/zuul/+/817056 | 20:12 | |
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 817035: Fix race condition when updating node requests https://review.opendev.org/c/zuul/zuul/+/817035 | 20:30 | |
@clarkb:matrix.org | Looks like GCE default firewall rules allow ingress port 22 which implies you're rules are not default? | 20:36 |
@spamaps:spamaps.ems.host | Correct! I am a tiny fish in a very big Spotify GCE ocean. ;) | 20:52 |
@spamaps:spamaps.ems.host | And for whatever reason my network-local firewall rules don't seem to be letting me do what I want. ;) | 20:53 |
@spamaps:spamaps.ems.host | But more pressing is trying to ship a patched nodepool. | 20:53 |
@jim:acmegating.com | spamaps: you said you have a test case; can you attach that to 817070? | 20:54 |
@spamaps:spamaps.ems.host | > <@jim:acmegating.com> spamaps: you said you have a test case; can you attach that to 817070? | 20:55 |
Ah yeah it looks like that didn't make it as I rebased out my flailing, let me add it back in. :) | ||
@jim:acmegating.com | sweet thx | 20:55 |
@spamaps:spamaps.ems.host | The one I have is nearly identical to the fixture in the gce unit tests. | 20:58 |
@spamaps:spamaps.ems.host | So I suspect this code isn't exercised. | 20:58 |
@spamaps:spamaps.ems.host | If anything my example is smaller than the one in the unit tests. | 20:59 |
@jim:acmegating.com | was it config validation that failed? | 21:00 |
@spamaps:spamaps.ems.host | It was failing when it tried to determine if the config had changed. | 21:00 |
@spamaps:spamaps.ems.host | ```launcher_1 | 2021-11-08 18:53:53,062 ERROR nodepool.NodePool: Exception in main loop: | 21:01 |
launcher_1 | 2021-11-08 18:53:53,062 ERROR nodepool.NodePool: Traceback (most recent call last): | ||
launcher_1 | 2021-11-08 18:53:53,062 ERROR nodepool.NodePool: File "/usr/local/lib/python3.9/site-packages/nodepool/launcher.py", line 1095, in run | ||
launcher_1 | 2021-11-08 18:53:53,062 ERROR nodepool.NodePool: self.updateConfig() | ||
launcher_1 | 2021-11-08 18:53:53,062 ERROR nodepool.NodePool: File "/usr/local/lib/python3.9/site-packages/nodepool/launcher.py", line 957, in updateConfig | ||
launcher_1 | 2021-11-08 18:53:53,062 ERROR nodepool.NodePool: provider_manager.ProviderManager.reconfigure(self.config, config, | ||
launcher_1 | 2021-11-08 18:53:53,062 ERROR nodepool.NodePool: File "/usr/local/lib/python3.9/site-packages/nodepool/provider_manager.py", line 52, in reconfigure | ||
launcher_1 | 2021-11-08 18:53:53,062 ERROR nodepool.NodePool: if oldmanager and p != oldmanager.provider:``` | ||
@jim:acmegating.com | oh, then it's entirely possible that the other user is also hitting this bug, since the config never changes :) | 21:01 |
@jim:acmegating.com | spamaps: if the existing unit tests aren't exercising that, maybe a config validation test would | 21:01 |
@spamaps:spamaps.ems.host | It may not have even prevented things from working. Are you sure it's not just streaming in the logs right now? ;) | 21:01 |
@jim:acmegating.com | spamaps: i feel there's a >90% change that's exactly what's happening :) | 21:02 |
@spamaps:spamaps.ems.host | It doesn't happen on an actual change, it just happens in the loop whenever we call updateConfig() | 21:03 |
@spamaps:spamaps.ems.host | and it gets eaten by the except line, so yeah, just logged and ignored. | 21:04 |
@spamaps:spamaps.ems.host | The first config works. | 21:04 |
@spamaps:spamaps.ems.host | Because other is None. | 21:04 |
@spamaps:spamaps.ems.host | > <@jim:acmegating.com> spamaps: if the existing unit tests aren't exercising that, maybe a config validation test would | 21:04 |
I wonder if it would make sense to just make sure updateConfig works twice on every fixture config. | ||
@jim:acmegating.com | yeah | 21:05 |
@spamaps:spamaps.ems.host | Actually we don't have to use updateConfig... | 21:06 |
@spamaps:spamaps.ems.host | but just do an equality check to oneself. | 21:06 |
@spamaps:spamaps.ems.host | or to the same one loaded twice. | 21:06 |
@spamaps:spamaps.ems.host | the latter would probably make more sense. | 21:06 |
@spamaps:spamaps.ems.host | I think I have a tiny tweak to the test_driver_gce that catches this bug. | 21:09 |
@spamaps:spamaps.ems.host | gah.. still can't run the unit tests native on Mac OS. | 21:16 |
@spamaps:spamaps.ems.host | Hrm, it's happening on Linux in docker too | 21:37 |
@spamaps:spamaps.ems.host | ```ImportError: cannot import name 'OP_NO_TICKET' from 'urllib3.util.ssl_' (/src/.tox/py39/lib/python3.9/site-packages/urllib3/util/ssl_.py)``` | 21:37 |
@spamaps:spamaps.ems.host | Anybody know how to get around this? | 21:37 |
@clarkb:matrix.org | we do test nodepool with focal + python3.9 and whatever openssl is on that platform | 21:38 |
@spamaps:spamaps.ems.host | I'm literally just mounting nodepool source into the nodepool container from Docker.io and trying to run tox on it. | 21:38 |
@clarkb:matrix.org | tox will install stuff to a new venv. The images on docker don't have the -dev(el) packages installed | 21:39 |
@clarkb:matrix.org | I wonder if it is just failing to link openssl properly? Though its weird since that stack shoudl all have wheels now | 21:39 |
@spamaps:spamaps.ems.host | libssl-dev is installed | 21:40 |
@clarkb:matrix.org | huh that is unexpected | 21:41 |
@spamaps:spamaps.ems.host | No I mean before I tox'd | 21:41 |
@clarkb:matrix.org | ya I know, the container builds are supposed to do all the -dev(el) linking in a throwaway image then wheels are copied out from that and installed on the file image without the -dev(el) packages present to keep image sizes down | 21:41 |
@clarkb:matrix.org | we probably have a wrong bindep rule, but also none of that explains the error | 21:42 |
@spamaps:spamaps.ems.host | Perhaps I'm not doing some step :-P | 21:45 |
@clarkb:matrix.org | spamaps: if I pull zuul/nodepool-launcher and run python and import ssl then >>> ssl.OP_NO_TICKET says <Options.OP_NO_TICKET: 16384> | 21:48 |
@spamaps:spamaps.ems.host | Clark: I just docker built from the Dockerfile in nodepool's repo and it didn't work there either. | 21:49 |
@spamaps:spamaps.ems.host | I didn't bindep.. forgot that existed.. so I'm trying that | 21:50 |
@spamaps:spamaps.ems.host | And I'm guessing no wheels are ever being used | 21:52 |
@spamaps:spamaps.ems.host | because every tox -epy39 takes 5-10 minutes. | 21:52 |
@spamaps:spamaps.ems.host | just for the installdeps | 21:52 |
@spamaps:spamaps.ems.host | ```--- import errors --- | 21:53 |
Failed to import test module: nodepool.tests.unit.test_driver_aws | ||
Traceback (most recent call last): | ||
File "/usr/local/lib/python3.9/unittest/loader.py", line 436, in _find_test_path | ||
module = self._get_module_from_name(name) | ||
File "/usr/local/lib/python3.9/unittest/loader.py", line 377, in _get_module_from_name | ||
__import__(name) | ||
File "/src/nodepool/tests/unit/test_driver_aws.py", line 24, in <module> | ||
import boto3 | ||
File "/src/.tox/py39/lib/python3.9/site-packages/boto3/__init__.py", line 16, in <module> | ||
from boto3.session import Session | ||
File "/src/.tox/py39/lib/python3.9/site-packages/boto3/session.py", line 17, in <module> | ||
import botocore.session | ||
File "/src/.tox/py39/lib/python3.9/site-packages/botocore/session.py", line 29, in <module> | ||
import botocore.credentials | ||
File "/src/.tox/py39/lib/python3.9/site-packages/botocore/credentials.py", line 34, in <module> | ||
from botocore.config import Config | ||
File "/src/.tox/py39/lib/python3.9/site-packages/botocore/config.py", line 16, in <module> | ||
from botocore.endpoint import DEFAULT_TIMEOUT, MAX_POOL_CONNECTIONS | ||
File "/src/.tox/py39/lib/python3.9/site-packages/botocore/endpoint.py", line 22, in <module> | ||
from botocore.awsrequest import create_request_object | ||
File "/src/.tox/py39/lib/python3.9/site-packages/botocore/awsrequest.py", line 24, in <module> | ||
import botocore.utils | ||
File "/src/.tox/py39/lib/python3.9/site-packages/botocore/utils.py", line 32, in <module> | ||
import botocore.httpsession | ||
File "/src/.tox/py39/lib/python3.9/site-packages/botocore/httpsession.py", line 10, in <module> | ||
from urllib3.util.ssl_ import ( | ||
ImportError: cannot import name 'OP_NO_TICKET' from 'urllib3.util.ssl_' (/src/.tox/py39/lib/python3.9/site-packages/urllib3/util/ssl_.py) | ||
================================================================================ | ||
The above traceback was encountered during test discovery which imports all the found test modules in the specified test_path. | ||
ERROR: InvocationError for command /src/.tox/py39/bin/stestr --test-path ./nodepool/tests/unit run --no-subunit-trace (exited with code 100) | ||
________________________________________________________________________ summary _________________________________________________________________________ | ||
ERROR: py39: commands failed | ||
root@74296c2b01e2:/src# .tox/py39/bin/python3 | ||
Python 3.9.6 (default, Aug 17 2021, 02:29:16) | ||
[GCC 10.2.1 20210110] on linux | ||
Type "help", "copyright", "credits" or "license" for more information. | ||
>>> import ssl | ||
>>> ssl.OP_NO_TICKET | ||
<Options.OP_NO_TICKET: 16384> | ||
>>>``` | ||
@clarkb:matrix.org | the docker image build should use bindep and install all the runtime deps for you. but it is a multistage build with a sacrificial image used just for building all the deps into wheels | 21:53 |
@spamaps:spamaps.ems.host | 🤷 | 21:53 |
@spamaps:spamaps.ems.host | Note that this is the urllib3 vendored ssl or something like that? | 21:54 |
@clarkb:matrix.org | its https://github.com/urllib3/urllib3/blob/main/src/urllib3/util/ssl_.py#L98-L148 | 21:54 |
@spamaps:spamaps.ems.host | ```>>> import urllib3.util.ssl_ | 21:54 |
>>> urllib3.util.ssl_ | ||
<module 'urllib3.util.ssl_' from '/src/.tox/py39/lib/python3.9/site-packages/urllib3/util/ssl_.py'> | ||
>>> urllib3.util.ssl_.OP_NO_TICKET | ||
Traceback (most recent call last): | ||
File "<stdin>", line 1, in <module> | ||
AttributeError: module 'urllib3.util.ssl_' has no attribute 'OP_NO_TICKET' | ||
>>>``` | ||
@clarkb:matrix.org | oh its importing OP_NO_TICKET from urlib | 21:55 |
@clarkb:matrix.org | and urllib is importing it from ssl? | 21:55 |
@spamaps:spamaps.ems.host | no idea | 21:56 |
@spamaps:spamaps.ems.host | I didn't miss this part of python | 21:56 |
@spamaps:spamaps.ems.host | and "this part" being "boto" | 21:56 |
@clarkb:matrix.org | I suspect that urrlib3 is too old for boto or similar | 21:56 |
@clarkb:matrix.org | Yup https://github.com/urllib3/urllib3/blob/1.25.11/src/urllib3/util/ssl_.py is the version on the image and has no OP_NO_TICKET | 21:57 |
@clarkb:matrix.org | spamaps: try to remove the urllib3 cap in requirements.txt? | 21:57 |
@clarkb:matrix.org | boto must've just recently released because we updated some nodepool stuff last week iirc | 21:58 |
@clarkb:matrix.org | https://pypi.org/project/botocore/#history literally 2 hours ago | 21:58 |
@spamaps:spamaps.ems.host | Yeah why is nodepool capped I wonder. | 21:59 |
@spamaps:spamaps.ems.host | ```nodepool 4.3.1.dev11 requires urllib3<1.26,>=1.25.4, but you have urllib3 1.26.7 which is incompatible.``` | 21:59 |
@spamaps:spamaps.ems.host | Yeah I just manually upgraded it and got that. | 21:59 |
@clarkb:matrix.org | there are comments about it. It was beacuse python requests couldn't work with 1.26 for a time. I think that is no longer an issue so we should bump the min requirement up to whatever botocore requires /me tries to figure that out now | 22:00 |
@clarkb:matrix.org | https://github.com/boto/botocore/blob/1.23.0/setup.cfg#L8 is inaccurate | 22:00 |
@spamaps:spamaps.ems.host | One wonders why that wouldn't be breaking the gate. | 22:01 |
@clarkb:matrix.org | it needs >=1.26.0 | 22:01 |
@clarkb:matrix.org | spamaps: because the change happened 2 hours ago when botocore released 1.23.0 | 22:01 |
@clarkb:matrix.org | you're the first to notice in that time span | 22:01 |
@spamaps:spamaps.ems.host | Wait, botocore has dependency problems? That doesn't sound right.. ;) | 22:01 |
@spamaps:spamaps.ems.host | So we could also pin botocore back | 22:01 |
@spamaps:spamaps.ems.host | Until they fix their stuff too | 22:02 |
@clarkb:matrix.org | I think bumping urllib3 up is fine. I'll push that up unless you want to | 22:02 |
@clarkb:matrix.org | or ya I guess we pin boto and file a bug against them | 22:02 |
@clarkb:matrix.org | https://github.com/boto/botocore/issues/2562 already exists as an upstream bug | 22:03 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/nodepool] 817114: Fix boto deps https://review.opendev.org/c/zuul/nodepool/+/817114 | 22:07 | |
@clarkb:matrix.org | I expect that integration testing will check ^ generally works against a devstackcloud with openstacksdk. | 22:08 |
@spamaps:spamaps.ems.host | And now I'm stuck because I don't have a zk | 22:23 |
@spamaps:spamaps.ems.host | Am I crazy to think it shouldn't be this hard just to run one unit test? | 22:23 |
@clarkb:matrix.org | there is a tools/test-setup-docker.sh | 22:23 |
@clarkb:matrix.org | I don't think its crazy, but I also think you're subverting the docker image to do something it was never intended for :P | 22:24 |
@clarkb:matrix.org | the docker image is meant to be deployed to production hence the intention of removing -devel package after wheels are built for example | 22:24 |
@spamaps:spamaps.ems.host | No I don't mean that I just mean like, I am on a mac, that is a very common thing.. :-P | 22:25 |
@clarkb:matrix.org | ya but none of us are able to test or reproduce that. I appreciate people might have macs but none of us do and having mac ci resources isn't very pragmatic due to licensing | 22:25 |
@clarkb:matrix.org | you essentially have to create a farm of mac minis | 22:25 |
@spamaps:spamaps.ems.host | I don't mind dockering to run tsts | 22:26 |
@spamaps:spamaps.ems.host | but this doesn't help me do that. | 22:26 |
@jim:acmegating.com | test-setup-docker.sh will run zk for tests | 22:27 |
@spamaps:spamaps.ems.host | Right so then what? I have to also docker run the tests. | 22:28 |
@spamaps:spamaps.ems.host | I'm not complaining, I just need a flow | 22:29 |
@spamaps:spamaps.ems.host | I can't seem to find one | 22:29 |
@jim:acmegating.com | test-setup-docker.sh will set up the dependencies. then on linux, i run the tests using tox. if you need to run them in docker for some reason, then i guess that's what you'd do. | 22:30 |
@spamaps:spamaps.ems.host | It is probably what I have to do.. we'll see in a minute.. now that I have boto pinned it might work. | 22:30 |
@spamaps:spamaps.ems.host | Ah ok the SSL error had me assuming it wouldn't run native on the mac | 22:34 |
@spamaps:spamaps.ems.host | Seems like it might work now | 22:34 |
@jim:acmegating.com | spamaps: just fyi, there are no guarantees that testing natively on mac for zuul or nodepool will work. people do it, but at least in zuul, there are workarounds required. i believe there are some tests which cannot run (i don't know if they are auto-skipped or not). | 22:37 |
@clarkb:matrix.org | gear is still not osx happy. nodepool doesn't rely on it anymore at least | 22:37 |
@clarkb:matrix.org | I suspect that is what zuul needs working around | 22:37 |
@jim:acmegating.com | theres other stuff | 22:37 |
@jim:acmegating.com | (i've seen notes in tests, i just don't remember off the top of my head what they are; not trying to be obtuse) | 22:40 |
@spamaps:spamaps.ems.host | Ok I got tests to work but my idea to just add a quick assertion doesn't work. | 22:46 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 817122: Shard config errors https://review.opendev.org/c/zuul/zuul/+/817122 | 23:05 | |
@jim:acmegating.com | Clark, tobiash, swest, fungi: ^ that should take care of the next thing we're observing in production on opendev | 23:05 |
@clarkb:matrix.org | Looking and thanks | 23:05 |
@jim:acmegating.com | i just kicked off a local test run for that (but i smoke tested a few first before pushing it up) | 23:06 |
@clarkb:matrix.org | corvus: couple of questions in there | 23:15 |
@clarkb:matrix.org | the deserialization of attributes like _path is something I generally wonder about as we seem to do it via the passed arg rather than the data in zk typically | 23:16 |
@clarkb:matrix.org | I wonder if it might be easier conceptually to just always store that in zk though it will use more disk to do that | 23:16 |
@jim:acmegating.com | Clark: replied | 23:17 |
@jim:acmegating.com | one test error locally; will fix both things | 23:17 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 817122: Shard config errors https://review.opendev.org/c/zuul/zuul/+/817122 | 23:20 | |
@jim:acmegating.com | Clark: ^ there we go | 23:20 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817067: Verify build path before refreshing from ZK https://review.opendev.org/c/zuul/zuul/+/817067 | 23:21 | |
@clarkb:matrix.org | +2 thanks | 23:21 |
@jim:acmegating.com | Clark: do you think we should get further review on that, or go ahead and approve it to get it into opendev prod? | 23:22 |
@clarkb:matrix.org | selfishly I'd like to get it into prod. I think this is a pattern we've gone through a few times converting unshared to sharded objects. Maybe we should just go for it? | 23:22 |
@clarkb:matrix.org | * selfishly I'd like to get it into prod. I think this is a pattern we've gone through a few times converting unsharded to sharded objects. Maybe we should just go for it? | 23:22 |
@jim:acmegating.com | yeah, i think this one is worth going for it and just making sure to ping tobiash and swest for retro-review | 23:23 |
@jim:acmegating.com | tristanC: thanks! :) | 23:26 |
@tristanc_:matrix.org | corvus: i'm trying to follow the on-going effort, but I find that new zk based system to be quite tricky to internalize, so I'm sorry to I'm not too helpful on those reviews. | 23:28 |
@clarkb:matrix.org | > <@tristanc_:matrix.org> corvus: i'm trying to follow the on-going effort, but I find that new zk based system to be quite tricky to internalize, so I'm sorry to I'm not too helpful on those reviews. | 23:30 |
It definitely requires a lot of effort to work through. I've generally had to block off an entire half a day when doing reviews of each stack so that I can read through them carefully and ensure I'm understanding it well enough | ||
@clarkb:matrix.org | I suspect it will get easier as things are in less of an in between space | 23:30 |
@spamaps:spamaps.ems.host | woo I got a test written and it even does some decorating | 23:32 |
-@gerrit:opendev.org- Clint Byrum proposed: [zuul/nodepool] 817070: Fix infinite recursion in GCE provider https://review.opendev.org/c/zuul/nodepool/+/817070 | 23:33 | |
@spamaps:spamaps.ems.host | ^^ Rebased on top of the boto bump and adds a unittest that reproduces the infinite recursion problem it is fixing. | 23:34 |
@spamaps:spamaps.ems.host | Does gerrit not link to the running zuul jobs anymore? | 23:37 |
@spamaps:spamaps.ems.host | Did it ever? I have been away for a while. ;) | 23:38 |
@clarkb:matrix.org | It did for like a day then we stopped doing it because it caused gerrit to have a sad. Since then the zuul status apis have been improved and we could probably give it another go but gerrit has also completely rewritten its UI so we'd have to start over on the gerrit side | 23:39 |
@clarkb:matrix.org | * It did for like a day then we stopped doing it because it caused zuul api for the status to have a sad. Since then the zuul status apis have been improved and we could probably give it another go but gerrit has also completely rewritten its UI so we'd have to start over on the gerrit side | 23:40 |
@spamaps:spamaps.ems.host | I just mean a link.. | 23:42 |
@spamaps:spamaps.ems.host | not the actual status. | 23:42 |
@clarkb:matrix.org | ya it never did that except for a day and it got reverted | 23:44 |
@spamaps:spamaps.ems.host | Ah. | 23:44 |
@spamaps:spamaps.ems.host | I seem to recall it used to post a comment. | 23:45 |
@spamaps:spamaps.ems.host | I know my GH based Zuul's all posted a comment at the beginning with a templated link to the status page. | 23:45 |
@spamaps:spamaps.ems.host | and in GH the checks API serves this purpose. | 23:45 |
@clarkb:matrix.org | There was a time when it posted a comment saying it had started jobs (I think it still does for the gate?) but I don't know that that ever included a link to running zuul jobs (because zuul only recently added the ability to do that) | 23:46 |
@spamaps:spamaps.ems.host | Yeah I faked it with search links. :) | 23:46 |
@spamaps:spamaps.ems.host | which were poorly documented | 23:46 |
@clarkb:matrix.org | and ya gerrit keeps deprecating and reinventing the system similar to github's checks api | 23:47 |
@clarkb:matrix.org | so we don't bother to use it yet in opendev. I'm hoping the version they've built for 3.4 will stick around long enough that maybe we can target that and make use of it once we upgrade to that version | 23:47 |
@spamaps:spamaps.ems.host | Anyway now to get back to actually trying to ship a patched nodepool into my little PoC setup. | 23:47 |
@spamaps:spamaps.ems.host | Looks like I can just git review -d my change and build a local docker image. Yay. | 23:48 |
@jim:acmegating.com | tristanC: yeah, it's a lot. maybe once we're done, we should do a video meeting to go over the design/concepts for zuul-maint (and use that to figure out what we should put in developer docs). | 23:50 |
@jim:acmegating.com | at a high level, it's mostly sticking to the spec, but we invented some ideas along the way to solve problems as they came up of course (like sharded zk objects) | 23:51 |
@tristanc_:matrix.org | corvus: that sounds like a good idea. I'm not sure what is your definition of "done" here, but I would be happy to learn more about (and help document) the implementation details :) | 23:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!