*** armstrongs has quit IRC | 00:00 | |
*** mattw4 has quit IRC | 00:07 | |
*** saneax has quit IRC | 00:30 | |
*** igordc has quit IRC | 01:24 | |
*** dperique has joined #zuul | 02:11 | |
*** dperique has left #zuul | 02:12 | |
*** dperique has joined #zuul | 02:12 | |
*** dperique has left #zuul | 02:14 | |
*** bhavikdbavishi has joined #zuul | 02:40 | |
*** bhavikdbavishi1 has joined #zuul | 02:43 | |
*** bhavikdbavishi has quit IRC | 02:44 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 02:44 | |
*** bhavikdbavishi has quit IRC | 03:40 | |
*** bhavikdbavishi has joined #zuul | 03:44 | |
*** raukadah is now known as chandankumar | 03:58 | |
flaper87 | corvus: nice, thanks for suggesting `tools`. I'll be writing this script for sure. I hope to be able to do that in the next month or so | 04:18 |
---|---|---|
*** saneax has joined #zuul | 04:42 | |
*** bolg has joined #zuul | 06:39 | |
*** hashar has joined #zuul | 07:09 | |
*** pcaruana has joined #zuul | 07:10 | |
*** themroc has joined #zuul | 07:13 | |
*** tosky has joined #zuul | 07:14 | |
*** saneax has quit IRC | 07:20 | |
*** jpena|off is now known as jpena | 07:46 | |
*** tosky has quit IRC | 07:58 | |
*** tosky has joined #zuul | 07:59 | |
*** hashar has quit IRC | 08:28 | |
*** saneax has joined #zuul | 08:32 | |
*** hashar has joined #zuul | 08:34 | |
openstackgerrit | Simon Westphahl proposed zuul/nodepool master: Sort waiting static nodes by creation time https://review.opendev.org/687271 | 08:43 |
*** themr0c has joined #zuul | 09:31 | |
*** themroc has quit IRC | 09:31 | |
*** hashar has quit IRC | 09:59 | |
*** avass is now known as Guest61857 | 10:00 | |
*** avass has joined #zuul | 10:00 | |
*** themr0c has quit IRC | 10:02 | |
*** themr0c has joined #zuul | 10:02 | |
*** themr0c has quit IRC | 10:09 | |
*** themroc has joined #zuul | 10:11 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Pagure - add the enqueue_ref unit test https://review.opendev.org/687351 | 10:13 |
*** jamesmcarthur has joined #zuul | 10:15 | |
*** jamesmcarthur has quit IRC | 10:19 | |
*** bhavikdbavishi has quit IRC | 10:24 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Pagure - add support for git.tag.creation event https://review.opendev.org/679938 | 11:14 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Pagure - Support for branch creation/deletion https://review.opendev.org/685116 | 11:15 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Pagure - add the enqueue_ref unit test https://review.opendev.org/687351 | 11:15 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Add optional support for circular dependencies https://review.opendev.org/685354 | 11:16 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Spec for allowing circular dependencies https://review.opendev.org/643309 | 11:23 |
*** jpena is now known as jpena|lunch | 11:29 | |
*** bhavikdbavishi has joined #zuul | 11:52 | |
*** bhavikdbavishi1 has joined #zuul | 11:55 | |
*** bhavikdbavishi has quit IRC | 11:56 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 11:56 | |
*** themroc has quit IRC | 11:57 | |
*** avass has quit IRC | 12:05 | |
tobiash | corvus: I had some thoughts on 677111 (direct push) what do you think? | 12:16 |
*** jpena|lunch is now known as jpena | 12:23 | |
*** avass has joined #zuul | 12:44 | |
*** jangutter_ has joined #zuul | 13:26 | |
*** jangutter has quit IRC | 13:30 | |
*** pcaruana has quit IRC | 13:31 | |
*** pcaruana has joined #zuul | 13:33 | |
*** ianychoi has quit IRC | 13:41 | |
*** saneax has quit IRC | 13:43 | |
*** rf0lc0 has joined #zuul | 13:51 | |
*** klindgren_ has joined #zuul | 13:52 | |
*** johanssone has quit IRC | 13:56 | |
*** rfolco has quit IRC | 13:56 | |
*** swest has quit IRC | 13:56 | |
*** SotK has quit IRC | 13:56 | |
*** gothicmindfood has quit IRC | 13:56 | |
*** klindgren has quit IRC | 13:56 | |
*** phildawson has joined #zuul | 13:56 | |
*** SotK has joined #zuul | 13:56 | |
*** swest has joined #zuul | 13:57 | |
*** openstackstatus has quit IRC | 13:58 | |
*** johanssone has joined #zuul | 13:58 | |
*** rf0lc0 is now known as rfolco | 14:02 | |
*** swest has quit IRC | 14:03 | |
*** fdegir has quit IRC | 14:06 | |
*** fdegir has joined #zuul | 14:07 | |
*** jamesmcarthur has joined #zuul | 14:19 | |
corvus | tobiash: replied | 14:23 |
tobiash | corvus: so you'd prefer to only make the push asynchronous instead of reporting in general? | 14:25 |
corvus | tobiash: yes, i think so -- i don't think we should scale out the scheduler with internal threads -- that has a limit... but that just made me think of something.... | 14:28 |
corvus | tobiash: we could leave the reporting implementation the way it is, and make the direct-push work depend on scale-out schedulers. | 14:29 |
corvus | tobiash: then, if a single scheduler is busy managing a push, other schedulers can still work on other pipelines | 14:29 |
corvus | (no matter how we implement it, a pipeline is basically frozen while its head is being reported) | 14:30 |
tobiash | good point | 14:30 |
tobiash | could we decouple the cyclic dependency support then from the direct push? Technically those two are independent of each other. | 14:31 |
corvus | tobiash: i'm uncomfortable having cyclic dependencies without 2-phase commit or some way to roll back | 14:32 |
tobiash | we could make the limitaions clear in the docs and it will always be an opt-in feature | 14:33 |
corvus | personally, i would say "don't use this, it is too dangerous, you can get into a state where you have to fix repos manually" and i don't like the idea of putting code out there that we tell people not to use. | 14:35 |
corvus | gerrit/github saying "no i can't merge this" is not theoretical, it happens | 14:35 |
tobiash | my problem is that we have a pressing need for cyclic deps and cannot waitfor scale out scheduler. And I'm also uncomfortable rebasing this for 6 months+ | 14:36 |
tobiash | and with github enterprise direct push is not possible at all yet | 14:36 |
corvus | tobiash: re ghe, why? | 14:37 |
tobiash | because of access restrictions with github apps, it's been resolved in github.com two months ago | 14:37 |
corvus | tobiash: do you have an eta for ghe? | 14:38 |
tobiash | I hope for the next version but we don't get any eta for anything from gh :( | 14:39 |
*** michael-beaver has joined #zuul | 14:39 | |
mordred | future looking statements / revenue recognition / blah blah | 14:39 |
*** jangutter has joined #zuul | 14:40 | |
*** jangutter_ has quit IRC | 14:43 | |
corvus | tobiash: i'd like to keep cyclic connected to direct-push, but i don't think we need to wait for HA scheduler. i think your connection-report-task-queue idea may not be too hard, and i think it would be compatible with HA scheduler. | 14:44 |
corvus | tobiash: (or, at least, easy to transition to ha scheduler) | 14:45 |
tobiash | ok, sounds like a compromise, after scale out scheduler it should be easy to rip it out again | 14:45 |
corvus | and i think it's fine for us to have 'experimental' support for cyclic with github, since once ghe releases with that, we'll be able to complete work easily. | 14:47 |
tobiash | :) | 14:50 |
pabelanger | I should note, the new triage roles in github.com are nice! We're in the processing of dropping write permissions for humans, but apparently people don't like giving up root access once they have it :D | 14:55 |
tobiash | corvus: it would be great if you could put 671674 (macos support for gear) onto your review list. This would simplify zuul development for us as we could run zuul tests natively on our dev machines then. It's not urgent though. | 14:55 |
corvus | tobiash: can we make the selection automatic? | 14:58 |
corvus | tobiash: also, why _wait_for_connection if not using epoll? | 15:00 |
*** bolg has quit IRC | 15:03 | |
tobiash | corvus: yep, automatic selection should be no problem I guess | 15:04 |
tobiash | regarding _wait_for_connection we should ask bolg (he seems to be offline atm) | 15:05 |
openstackgerrit | Jan Gutter proposed zuul/nodepool master: Add port-cleanup-interval config option https://review.opendev.org/687024 | 15:07 |
*** bhavikdbavishi has quit IRC | 15:37 | |
pabelanger | mordred: so, I'd like to drop the dependency on setup.cfg for tox_siblings, in our case we have ansible roles, that was to use tox and siblings, but we don't ship setuptool files in our repos. Looks like we just use it to get project name for logging only and the following check | 15:40 |
pabelanger | changed=False, msg="No name in setup.cfg, skipping siblings") | 15:40 |
pabelanger | maybe we can flip that to look for tox.ini file | 15:40 |
pabelanger | or project name from zuul variable | 15:41 |
mordred | the project name is important - it's how tox-siblings knows which of the projects in required-projects should be installed into the tox venv (it doesn't just install everything from required-projects, because of things like openstack/requirements repo) | 15:43 |
mordred | pabelanger: so I don't think we should change the default behavior - that said, I think adding support somehow for the things you're trying to do with your roles would be in scope | 15:43 |
mordred | however, roles wouldn't get "installed" into the tox venv - so maybe we should think through making sure we're solving your problem | 15:44 |
pabelanger | mordred: well, if we can get the name from something other then setup.cfg, that is really what I'm looking for. zuul.project.short_name comes to mind | 15:46 |
mordred | it does't match | 15:46 |
mordred | we need the pip requirement name | 15:46 |
pabelanger | ah | 15:46 |
mordred | and we need it for every repo in required-projects | 15:46 |
mordred | do you have an example I can look at? | 15:47 |
pabelanger | okay, yah, so the user case here is, no pip project, that uses tox for dependency management, and we want cross-project testing of a dependency, molecule in this case | 15:47 |
pabelanger | sure | 15:47 |
pabelanger | https://github.com/ansible-security/ids_config/pull/5 | 15:47 |
jangutter | pabelanger: regarding dropping root access, Android made me actively _want_ to have a mode where root access has been revoked. | 15:48 |
pabelanger | jangutter: ++ | 15:48 |
jangutter | pabelanger: one of the few examples where I went from "I must be able to root the phone" to "If this phone is rootable, I don't trust it" | 15:48 |
jangutter | pabelanger: I just realised that I publicly stated that I've given up having control of my own device. Handing in my geek card now. | 15:49 |
mordred | pabelanger: ok - I'm in a meeting, but let me wrap my head around what you've got here | 15:50 |
pabelanger | mordred: np! thanks. I can work around it for now adding the setuptools bits | 15:50 |
jangutter | Shrews: one unrelated failure in https://review.opendev.org/#/c/687024/ but is that more or less what you had in mind? | 15:51 |
*** openstackgerrit has quit IRC | 15:52 | |
mordred | pabelanger: I don't think you need tox-siblings here - your install script is the thing that's handling "installing" the sibling repos | 15:52 |
mordred | (also, I like the sed solution there :) ) | 15:53 |
pabelanger | mordred: yah, for roles that is right. But in test-requirement.txt, we need to pull in never molecule version, via depends-on | 15:53 |
fungi | jangutter: yeah, that works *as long as* you you trust google not to abuse/sell/give root access for your device without your knowledge | 15:53 |
*** toabctl has quit IRC | 15:53 | |
fungi | all "root access not possible" means is that google claims there are no backdoors other than the ones it chooses to keep secret from you | 15:54 |
mordred | pabelanger: nod - so the only thing that should need a setup.cfg is molecule - where is the lack of setup.cfg in that repo breaking you? (we should be gracefully handling repos that don't have setup.cfg) | 15:54 |
fungi | jangutter: well, and also your hardware vendor | 15:55 |
jangutter | fungi: yep. and it's turtles all the way down... (speaking as someone working for a hardware vendor....) | 15:55 |
mordred | pabelanger: (so it's entirely possible there's a bug we need to fix) | 15:55 |
jangutter | fungi: but at the end of the day, do you really trust your physics stack? I mean neutrinos are obviously such a security hole. | 15:55 |
pabelanger | mordred: yes, I believe that is right. Currently setup.cfg is a hard requirement | 15:55 |
pabelanger | if we can skip that, I think things will work properly, assuming we know the right name | 15:56 |
Shrews | jangutter: a quick look says "yes". will have a closer look in a bit | 15:56 |
fungi | jangutter: i'd wager if google wants to be able to sell those phones in countries which demand access to their residents' mobile devices, then there are probably just holes they don't tell people about so they can continue to sell their products | 15:57 |
mordred | pabelanger: well - I think we don't need to know the name - is one of the older versions of the PR one without the setup.cfg? | 15:57 |
* Shrews fumes over OS X changing his default shell out from under him | 15:57 | |
jangutter | Shrews didn't read Ars Technica's writeup on Catalina. | 15:57 |
fungi | i guess if those devices become unavailable for purchase in china (or, for that matter, the usa) then it probably means it's actually not granting secret access for governments and law enforcement | 15:58 |
pabelanger | mordred: yes I believe https://dashboard.zuul.ansible.com/t/ansible/build/c388db008b4749679d2fa3573711f52a | 15:58 |
jangutter | fungi: fun fact, South Africa has a law mandating government access to intercept communications. We're waay ahead of the rest of the world with surveillance-as-a-service. | 15:58 |
mordred | pabelanger: cool, thanks | 15:59 |
fungi | jangutter: so does the usa, it's been on the books since ~1998, i had to manage law enforcement access to packet sniffers at an internet service provider for years | 15:59 |
fungi | basically they give you two days to set up access for law enforcement agencies after receipt of a warrant, but prefer you just set them up with perpetual access and they "promise" to only actually use it when they have a valid warrant issued | 16:00 |
*** bhavikdbavishi has joined #zuul | 16:01 | |
fungi | calea, the communications assistance for law enforcement act | 16:01 |
jangutter | fungi: ours require monitoring at all times, aggregated to a centralised surveillance warehouse. access to the historical data is only given with a warrant, or if you know a guy. | 16:02 |
fungi | well, also here, warrants are available if you know a guy, but sure, may as well dispense with the formality ;) | 16:02 |
*** igordc has joined #zuul | 16:12 | |
*** openstackgerrit has joined #zuul | 16:20 | |
openstackgerrit | James E. Blair proposed zuul/zuul-registry master: Implement namespacing https://review.opendev.org/687613 | 16:20 |
corvus | tristanC: ^ let me know if that makes sense | 16:20 |
openstackgerrit | James E. Blair proposed zuul/zuul-registry master: Implement namespacing https://review.opendev.org/687613 | 16:21 |
tristanC | corvus: having test would help understand... could we add a test_main with the cherrypy test object? | 16:22 |
*** jamesmcarthur has quit IRC | 16:22 | |
corvus | tristanC: i plan on adding tests, but i think the most effective test of this would be functional | 16:22 |
corvus | i plan to extend the test playbook to set this up and exercise it with docker, podman, k8s, etc | 16:23 |
corvus | that's next on my list | 16:24 |
*** gothicmindfood has joined #zuul | 16:27 | |
*** jangutter has quit IRC | 16:29 | |
*** jamesmcarthur has joined #zuul | 16:30 | |
*** hashar has joined #zuul | 16:32 | |
openstackgerrit | James E. Blair proposed zuul/zuul-registry master: Implement namespacing https://review.opendev.org/687613 | 16:33 |
fungi | looks like nodepool-build-image has been consistently ending in post_failure today (or it's a coincidence which has hit three builds across two changes): http://zuul.opendev.org/t/zuul/builds?job_name=nodepool-build-image | 16:35 |
*** jamesmcarthur has quit IRC | 16:35 | |
fungi | "push: dial tcp 127.0.0.1:9000: connect: connection refused" | 16:36 |
fungi | is that using the new zuul-registry implementation yet? | 16:36 |
corvus | fungi: no, z-r is only used for opendev's intermediate registry | 16:37 |
fungi | okay, thanks. and yeah, this error does look familiar so i guess it's just the nondeterministic problem we've seen previously with the old design | 16:38 |
fungi | just wanted to be sure it wasn't something suddenly broken in z-r | 16:38 |
corvus | fungi: oh i was thinking this was new | 16:38 |
corvus | i don't understand the error yet | 16:38 |
fungi | well, i recall making a joke about the fact that it says "dial" earlier in the week or maybe last week | 16:39 |
* fungi digs | 16:39 | |
*** jamesmcarthur has joined #zuul | 16:39 | |
fungi | ahh, no, the other occurrence was a "no route to host" you were experiencing when testing locally | 16:40 |
fungi | not "connection refused" | 16:40 |
corvus | this may be fallout from the jwt change, but i don't see how yet | 16:40 |
fungi | but the db does have a similar failure on record from 5 days ago, http://zuul.opendev.org/t/zuul/build/54670bf42d5a41e0bfced6e6ee3a537c | 16:42 |
corvus | oh wow big bug. patch incoming :) | 16:42 |
openstackgerrit | James E. Blair proposed zuul/zuul-registry master: Fix authorization URL https://review.opendev.org/687622 | 16:48 |
fungi | thanks! reviewing now. jangutter initially mentioned one of those failures in #openstack-infra so i went digging to see whether it was happening often | 16:48 |
corvus | fungi, tristanC, mordred: ^ | 16:48 |
corvus | we also need a system-config change for opendev, i'll do that real quick | 16:48 |
corvus | er that patch is missing something, 1 sec | 16:49 |
openstackgerrit | James E. Blair proposed zuul/zuul-registry master: Fix authorization URL https://review.opendev.org/687622 | 16:49 |
fungi | aha, so that worked when you were testing locally i guess because it actually was listening on 127.0.0.1? | 16:49 |
corvus | fungi: yes and the tests are as well | 16:49 |
corvus | however, that is not the right url to use when obtaining a token from opendev's intermediate registry :) | 16:50 |
fungi | of course | 16:51 |
*** jpena is now known as jpena|off | 16:53 | |
*** panda is now known as panda|off | 16:53 | |
daniel2 | so switching to raw from qcow2 entirely fixed my issues, nodes are spinning up and marked as "ready" | 17:03 |
fungi | remind me, what was the previous behavior you saw when using qcow2? | 17:22 |
fungi | boot timeouts? | 17:22 |
SpamapS | Been there, done that, crashed that cloud. | 17:23 |
fungi | did you at least get a tee shirt? | 17:24 |
pabelanger | fungi: yah, nova defaults to convert qcow2 to raw on compute nodes | 17:26 |
pabelanger | we had same issue in infracloud | 17:26 |
pabelanger | but disabled that in nova.conf | 17:26 |
fungi | i remember | 17:27 |
*** bhavikdbavishi has quit IRC | 17:37 | |
openstackgerrit | Merged zuul/zuul-registry master: Fix authorization URL https://review.opendev.org/687622 | 17:50 |
openstackgerrit | James E. Blair proposed zuul/zuul-registry master: Implement namespacing https://review.opendev.org/687613 | 17:52 |
corvus | okay, we should be back to normal now | 17:52 |
corvus | feel free to recheck any changes that hit problems with container jobs (eg zuul-quick-start, build-image, etc) | 17:53 |
fungi | thanks!!! | 17:54 |
*** igordc has quit IRC | 17:59 | |
*** jamesmcarthur has quit IRC | 18:45 | |
*** pcaruana has quit IRC | 19:07 | |
*** mgagne has quit IRC | 19:11 | |
mordred | corvus: I got a 500 from registry: https://zuul.opendev.org/t/openstack/build/32f0bcffc0e84a41b22f912dd96c2690 | 19:29 |
mordred | corvus: my hunch is that it's an depends-on image layer from a job from before the new registry rollout ... | 19:30 |
mordred | but I have not investigated further yet, and 500 is a weird error for that | 19:30 |
corvus | mordred: that would make sense, i did not port over the data | 19:31 |
corvus | we should check the times first to verify your hunch, then also, i agree, see about maybe having that situation present itself differently :) | 19:31 |
mordred | corvus: ah - yes - missing layer | 19:32 |
mordred | corvus: swift is returning a 404 and that's causing an exception in cherrypy | 19:32 |
mordred | corvus: let me see if I can make a patch | 19:32 |
*** hashar has quit IRC | 19:34 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-registry master: Raise a 404 when we don't find a blob from swift https://review.opendev.org/687657 | 19:39 |
mordred | corvus: ^^ hasher.update(None) doesn't work too well | 19:39 |
tobias-urdin | quick question, if i have a job with a parent and i want to override required-projects completely and not append to it, can i do that? | 19:40 |
corvus | tobias-urdin: no, usually that situation calls for reworking the inheritance (so that required-projects are added at a lower level) | 19:42 |
tobias-urdin | ok thanks! | 19:42 |
corvus | mordred: i agree with your technical fix, but i'm confused about how we ended up with a manifest referencing a blob which doesn't exist | 19:44 |
corvus | mordred: can you point me at some job output? | 19:45 |
mordred | corvus: yes | 19:51 |
mordred | corvus: https://zuul.opendev.org/t/openstack/build/32f0bcffc0e84a41b22f912dd96c2690 is the job output for the failed job | 19:52 |
mordred | corvus: and then the docker logs for the registry container have the http trace | 19:53 |
mordred | corvus: I'm also confused as to how we wound up in the situation | 19:54 |
mordred | s/the/that/ | 19:54 |
mnaser | forgive the irony of this, but is "kick off a jenkins job and return value" is something we can add into zuul/zuul-jobs ? | 19:54 |
mnaser | i'm only saying it because several voices raised the need (and afaik, i think thats how volvo talked about how they did it?) | 19:54 |
pabelanger | should be, I know software factory does it with ovirt, IIRC | 19:55 |
corvus | mnaser: yep, i don't see why not. it's worth noting that there are some serious pitfalls there if you want to try to do that to actually test a change, but there are plenty of other legitimate reasons to do so. | 19:55 |
*** hashar has joined #zuul | 19:55 | |
corvus | mnaser: having a role for that which explained clearly what it does and the limitations would be good, i think. also, if folks wanted to try to overcome those limitations (ie, transfer git repos to a jenkins workspace, etc), good to have a place to share. | 19:56 |
corvus | (to elaborate: it would be simple and correct to use that to trigger some other build or deployment system when a change merges. it would take a lot of work to do that to run a test. if someone thinks "i will just have my jenkins job fetch the change and test it" then they don't understand what zuul is doing) | 19:58 |
mordred | yeah | 19:59 |
corvus | it is a problem that can be overcome, but this is at the core of the impedance mismatch between jenkins and zuul, and, er, really the reason we wrote it in the first place. | 19:59 |
mordred | yeah | 20:00 |
mordred | corvus: although - I do think zuul has grown some abilities since last I pondered this concept that might make solving it easier than it was before ... but definitely a non-zero amount of work that is fraught with dragons and peril | 20:00 |
Shrews | and ponies? evil, evil ponies? | 20:01 |
mordred | so many evil ponies | 20:01 |
fungi | the only kind | 20:03 |
corvus | anyone have a good 'browse swift like a filesystem' cli tool? | 20:03 |
mordred | I do not | 20:04 |
corvus | i just want to cat some files | 20:04 |
* corvus writes a repl real quick | 20:05 | |
mordred | corvus: I will enjoy your repl | 20:05 |
fungi | i see https://github.com/redbo/cloudfuse but not sure if it's any good | 20:06 |
mordred | also https://github.com/ovh/svfs | 20:07 |
pabelanger | mnaser: I do agree with corvus, running jenkins jobs trigger from zuul, can be done, but I wouldn't do it long term. But can see it as part of a migration plan, POC. | 20:07 |
pabelanger | https://gerrit-staging.phx.ovirt.org/#/c/379/ was the example I was thinking of | 20:08 |
fungi | neat, ovirt's using zuul?!? (and gerrit?) | 20:09 |
fungi | i had no idea | 20:09 |
pabelanger | I believe they are experimenting with it, but tristanC and dmsimard should know more | 20:10 |
dmsimard | corvus: s3ql | 20:11 |
dmsimard | (for browsing swift) | 20:11 |
corvus | dmsimard, pabelanger, mordred: thanks! | 20:11 |
corvus | mordred: it's looking for this blob: sha256:49f4f4efd2abc8d2780773269f0d96c0c62f153a673e677d2b16fe3d881aa75d | 20:12 |
tristanC | fungi: here is oVirt config project: https://gerrit.ovirt.org/gitweb?p=ovirt-zuul-config.git;a=summary | 20:12 |
fungi | nifty! | 20:12 |
corvus | mordred: so the question is, when the b0ac18ded52e418b971393bc0568f7ff_latest manifest was uploaded, why wasn't that blob there? | 20:13 |
tristanC | mnaser: another missing piece to trigger jenkins job is a zuul-jobs/role to create a git reference that jenkins can clone from | 20:13 |
tristanC | e.g.: https://gerrit.ovirt.org/gitweb?p=jenkins.git;a=blob;f=zuul-playbooks/expose_source.yaml;h=973ff601e6e5abbbefa43808813f3c0e6946d538;hb=HEAD | 20:14 |
tristanC | mnaser: we are also looking into triggering tekton pipeline from a zuul jobs... | 20:15 |
mordred | corvus: that is a fascinating question | 20:15 |
corvus | mordred: that should be this job: http://zuul.opendev.org/t/openstack/build/b0ac18ded52e418b971393bc0568f7ff | 20:17 |
pabelanger | tristanC: why not push the content to jenkins node? | 20:18 |
tristanC | pabelanger: i think it's because the job are expecting to clone something | 20:18 |
mordred | node ownership gets weird too | 20:19 |
corvus | mnaser, tristanC: maybe you could work on porting that overt playbook into zuul-jobs? | 20:19 |
corvus | ovirt even | 20:19 |
corvus | mordred: http://paste.openstack.org/show/782465/ | 20:20 |
mnaser | yeah, i can see if i have sometime, i'm kinda trying to see if i have an actual use case that i can test this against | 20:20 |
mordred | corvus: that looks like it uploaded and returned a 201 | 20:20 |
corvus | ayup | 20:21 |
mordred | we're not setting expires headers are we? | 20:22 |
corvus | mordred: nope. we have a cron job to prune | 20:22 |
corvus | mordred: i wonder if we're seeing another manifestation of whatever caused the issues with the docker registry? | 20:26 |
corvus | a 0 byte file and a 404 share some things in common | 20:27 |
corvus | mordred: i'm out of immediate ideas -- i suggest we stop the prune cron for a while and add a little more debug logging, and wait for it to happen again | 20:29 |
mordred | corvus: ooh. that's an interesting thought | 20:30 |
mordred | corvus: like - what if we're writing this, and then something is going away | 20:31 |
mordred | corvus: yeah. I think that's a good idea | 20:31 |
fungi | what criteria does the prune use (or is it supposed to use) to decide it's safe to remove images? | 20:31 |
mordred | corvus: also - it might be wasteful resource-wise - but what if we fetch after put to verify we get content back? | 20:31 |
fungi | that does seem like it could be costly for large images | 20:32 |
mordred | corvus: (although, also - technically it is an eventually consistent system, so I believe the object is not guaranteed to exist immediately after PUT) | 20:32 |
corvus | mordred: can we get clarification on that? | 20:32 |
corvus | because that sounds very problematic | 20:32 |
mordred | fungi: maybe we can HEAD the object we PUT to at least get a record in the logs that swift thinks the thing is there and has a length? | 20:32 |
fungi | that would not be costly, sure | 20:32 |
corvus | mordred: the head sounds like a reasonable debug thing under the circumstances. | 20:33 |
*** pcaruana has joined #zuul | 20:33 | |
mordred | timburke, tdasilva: ^^ are we saying dumb swift things? | 20:33 |
fungi | and yeah, i've always heard that swift makes no guarantees of immediate availability from one node for objects written to another node, but i'm unsure of the details of that | 20:33 |
mordred | corvus: do we know if we've seen this on both ceph and swift? (assuming the zero-byte issue is the same as this 404 issue) | 20:36 |
corvus | mordred: only ever on rax swift | 20:37 |
corvus | mordred: by that, i mean we've never run this on anything else | 20:37 |
mordred | corvus: nod | 20:37 |
mnaser | i'm pretty sure swift has eventual consistency | 20:39 |
* timburke is reading... | 20:39 | |
mnaser | " For example, suppose a container server is under load and a new object is put in to the system. The object will be immediately available for reads as soon as the proxy server responds to the client with success. However, the container server did not update the object listing, and so the update would be queued for a later update. Container listings, therefore, may not immediately contain the object." | 20:40 |
*** pcaruana has quit IRC | 20:40 | |
mnaser | from swift docs, but, i'm sure you have a much better source of info :) | 20:40 |
corvus | that's fine -- we only use listings for async things like prune | 20:41 |
corvus | (but if a put=201 can be followed by a get=404, then we may need to rethink some things) | 20:42 |
timburke | you'd need some pretty pathological failures to have a 201 on PUT be followed by a 404 on GET -- having a bunch of nodes error-limited during the PUT but then having that wear off by the time you're doing the GET, or some weird write affinity setting and then the cluster's WAN link goes down, that sort of thing | 20:46 |
timburke | of course, i don't have much insight on how rax swift behaves these days... | 20:47 |
corvus | cool, i think that's good enough for us to assume that mordred's 'head after put' is a potentially useful debug tool as well as confirm that our design isn't crazy | 20:47 |
corvus | (if head after put returns 404, then indeed it does mean something has gone wrong) | 20:48 |
corvus | mordred: since that upload, we've enable cherrypy request logging, so we're already getting a bit more info than before, though i have my doubts that would tell us more in this case -- we have enough logging to know that we believe we uploaded both the manifest blob and the tag which pointed to it. i think the head-after-put log would be useful, and turning of pruning just to eliminate that as a potential | 20:55 |
corvus | source are the best next steps. | 20:55 |
timburke | fwiw, the 404 in http://paste.openstack.org/show/782465/ seems expected -- given the is-stale check around https://github.com/openstack/openstacksdk/blob/0.36.0/openstack/object_store/v1/_proxy.py#L378 | 20:56 |
timburke | i guess i'm still wondering what the observed bad was -- but i've gotta run for a meeting | 20:57 |
corvus | timburke: oh sorry -- it's the next 404 after that, several days later, that's unexpected | 20:59 |
mordred | corvus: want me to make a patch for that? | 21:01 |
corvus | mordred: that'd be great, i've got a handful of half-written patches in my tree right now :/ | 21:01 |
mordred | kk | 21:02 |
corvus | timburke, mordred: here's all the relevant logging i can come up with: the first chunk is the upload of the blob and the tag which references it, and the second chunk is when we went to go fetch that blob. | 21:02 |
corvus | timburke, mordred: http://paste.openstack.org/show/782618/ | 21:03 |
corvus | mordred, fungi: the tricky thing about the prune command is that i can't find any output to confirm whether it did or did not delete anything. i *expect* it not to delete anything, and in that case i also expect it not to output anything. so i think the approach there should be for us to turn off prune for a bit (we're not scheduled to prune anything for a while anyway), update it to emit some logging | 21:04 |
corvus | without pruning, make sure we get that going where we want, then turn it back on for real. | 21:04 |
mordred | corvus: ++ | 21:04 |
fungi | yeah, that's reasonable | 21:05 |
fungi | i too would expect no output to mean nothing was done, but out of curiosity how is docker able to figure out that images written to the registry are no longer "needed"? | 21:06 |
fungi | does that depend on having actual containers running from them on the system in question? | 21:06 |
fungi | (which for the registry would only be the images used to run the registry service itself, right?) | 21:07 |
corvus | fungi: it's not docker doing the pruning i'm talking about, it's zuul-registry. zuul-registry prune does 2 things -- clean up aborted uploads, and remove images older than a certain amount of time (180 days for opendev) -- it is, after all, an *intermediate* registry. | 21:08 |
corvus | as long as we don't have many aborted uploads, our usage won't be affected by not pruning for quite some time | 21:09 |
fungi | ahh, well 687673 is turning off the `docker image prune -f` task | 21:09 |
fungi | thought that was still what was being discussed, sorry | 21:10 |
corvus | fungi: welp, that change is completely wrong then let me fix it :) | 21:10 |
fungi | ;) | 21:11 |
fungi | okay, the new patchset is making more sense in the context of this discussion, thanks! | 21:12 |
corvus | yay code review | 21:12 |
*** hashar has quit IRC | 21:19 | |
*** armstrongs has joined #zuul | 21:25 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-registry master: HEAD object after PUT https://review.opendev.org/687681 | 21:25 |
*** rfolco has quit IRC | 21:28 | |
openstackgerrit | Merged zuul/zuul-registry master: Raise a 404 when we don't find a blob from swift https://review.opendev.org/687657 | 21:29 |
openstackgerrit | Monty Taylor proposed zuul/zuul-registry master: HEAD object after PUT https://review.opendev.org/687681 | 21:34 |
*** armstrongs has quit IRC | 21:35 | |
*** avass has quit IRC | 21:37 | |
*** jamesmcarthur has joined #zuul | 22:42 | |
openstackgerrit | James E. Blair proposed zuul/zuul-registry master: Run docker and podman push/pull tests https://review.opendev.org/687692 | 22:46 |
corvus | tristanC: i did not get as far as i hoped with tests today, but my plan is to essentially do what's in that change ^ again in the buildset configuration; hopefully tomorrow | 22:47 |
openstackgerrit | Merged zuul/nodepool master: Assign static 'building' nodes in cleanup handler https://review.opendev.org/687261 | 23:01 |
*** saneax has joined #zuul | 23:14 | |
*** openstackstatus has joined #zuul | 23:19 | |
*** ChanServ sets mode: +v openstackstatus | 23:19 | |
*** tosky has quit IRC | 23:30 | |
*** jamesmcarthur has quit IRC | 23:43 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!