| jhesketh | dmsimard: No, there isn't a spec (afaik). Happy to work on one with you though if you think it's useful. Otherwise hopefully the direction those patches are going help explain the goals. | 00:46 |
|---|---|---|
| * jhesketh really needs to return to that :-s | 00:46 | |
| pabelanger | oh, interesting, in one of my tests, nodepool-builder lost access to zookeeper, it seems during a build. This is on ovh, so suspect node is slow, however I wouldn't have expect nodepool-builder to delete the already completed DIB and generate it again: http://logs.openstack.org/04/622604/1/check/windmill-src-fedora-latest/88c57a7/logs/nb01/var/log/nodepool/builder-debug.log | 00:51 |
| *** rlandy has quit IRC | 00:59 | |
| pabelanger | Shrews: Looking at https://git.zuul-ci.org/cgit/nodepool/tree/nodepool/builder.py#n795 if DIB was successful, what zk info could we have lost, if we lose the connection to zk? | 01:05 |
| pabelanger | Shrews: basically, trying to understand why we set zk.FAILED there if connetion is lost, and not wait for it to recover | 01:06 |
| pabelanger | like we did at https://git.zuul-ci.org/cgit/nodepool/tree/nodepool/builder.py#n781 | 01:06 |
| *** j^2 has joined #zuul | 02:15 | |
| tristanC | Shrews: not sure what changed between 3.3.0 and 3.3.1, but "deleting" node are leaking, we now have 16 of those... here are the logs: http://paste.openstack.org/show/736679/ | 02:17 |
| clarkb | tristanC: ya there is a fix for that. though I dont think it was an issue in 3.3.1? came aftwr iirc | 02:18 |
| clarkb | basically nodepool created delete records for nodes that had exceptions booting (previously we leaked them without a node record) but the first fix didnt include pool info so we still couldnt delete them | 02:19 |
| tristanC | clarkb: i also thought the missing pool info was the issue because of the "OpenStackProvider: Cannot find provider pool for node" message | 02:21 |
| tristanC | but the NodeDeleter doesn't seems to use the pool info to delete the node | 02:21 |
| tristanC | then I looked into the "node_exists" attribute, but Shrews said those node shall have an id, e.g.: https://git.zuul-ci.org/cgit/nodepool/tree/nodepool/launcher.py#n93 | 02:23 |
| dmsimard | tristanC: that's another fix that landed | 02:23 |
| dmsimard | Not sure if it's been released | 02:23 |
| dmsimard | https://review.openstack.org/#/c/622403/ | 02:26 |
| dmsimard | Saw this issue in our logs a while back | 02:27 |
| tristanC | dmsimard: iiuc, that's about empty zknode, which also are an issue, but a different one | 02:27 |
| tristanC | perhaps https://review.openstack.org/#/c/621301/ would help figure out what's the issue | 02:27 |
| tristanC | commit message doesn't really help though... | 02:27 |
| clarkb | tristanC I thought it needed the pool info as it was short circuiting otherwise | 02:28 |
| tristanC | clarkb: alright, thanks for the tip. I'll give the fix a try | 02:30 |
| pabelanger | tristanC: clarkb: https://review.openstack.org/621040/ is the issue I think will fixed leaked nodes for vexxhost in ansible-network... as for deleting nodes. Because this is rdocloud, i think you need to have a cloud admin reset the state of the VM, I believe nodepool is saying delete by openstack isn't | 02:42 |
| pabelanger | tristanC: you can test, by trying to manually delete the node with openstack client | 02:42 |
| pabelanger | if that fails, then the state needs to be toggled on openstack side | 02:42 |
| tristanC | pabelanger: there are some Unauthorized exception too, not sure if they came with nodepool-3.3.1 though... here is an example: http://paste.openstack.org/show/736680/ | 02:47 |
| pabelanger | tristanC: if you trace the log, was that to limestone? If so, we did a password reset today and updated clouds.yaml with dmsimard | 02:49 |
| tristanC | pabelanger: the provider is not logged, and those are quite frequent though, about 3000 since 2018-12-01 | 02:51 |
| pabelanger | tristanC: are they still happening? | 02:51 |
| pabelanger | yah, we should log the provider to help debug | 02:51 |
| pabelanger | tristanC: it is likely limestone auth was down for a few days also | 02:52 |
| tristanC | pabelanger: last one was 2018-12-04 16:52:09,287 | 02:52 |
| tristanC | (utc) | 02:52 |
| pabelanger | tristanC: I think that is around the time dmsimard update clouds.yaml on sf.io, you likely can confirm with timestamp on the file | 02:52 |
| pabelanger | tristanC: I am not sure the process on sf.io side where the password is stored | 02:53 |
| tristanC | indeed ~nodepool/.config/openstack/clouds.yaml is Dec 4 16:52 | 02:53 |
| pabelanger | cool | 02:54 |
| *** bhavikdbavishi has joined #zuul | 02:56 | |
| *** bhavikdbavishi has quit IRC | 03:01 | |
| *** bhavikdbavishi has joined #zuul | 03:18 | |
| openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Set type for error'ed instances https://review.openstack.org/622101 | 04:04 |
| *** bjackman has joined #zuul | 04:05 | |
| *** mordred has quit IRC | 04:57 | |
| *** mordred has joined #zuul | 04:57 | |
| tobiash | tristanC: commented ^ | 05:09 |
| tristanC | tobiash: thanks, i agree that tests (and better commit message too) would help understand the new failures we are having with nodepool-3.3.1 | 05:18 |
| tristanC | tobiash: but for the moment, i'm just trying to mitigate node leaking with the last released version... | 05:19 |
| tristanC | fwiw, here is the list of backport that seems to help: https://softwarefactory-project.io/r/14521 | 05:21 |
| tobiash | tristanC: I'm sure there is already a server create failed scenario. That probably just misses the quota call afterwards | 05:23 |
| *** j^2 has quit IRC | 05:29 | |
| tobiash | tristanC: if you don't have time maybe I can help with the test later | 05:32 |
| tristanC | tobiash: https://review.openstack.org/622101 seems critical though, provider with a failing node are not able to launch new nodes because of the type IndexError exception | 05:45 |
| *** dmellado has quit IRC | 05:46 | |
| *** gouthamr has quit IRC | 05:46 | |
| tobiash | One more argument for a test. I'll look later | 05:46 |
| tristanC | this was not issue without https://review.openstack.org/621681, because the estimatedNodepoolQuotaUsed was skipped as the node didn't have a pool set. | 05:47 |
| openstackgerrit | Merged openstack-infra/nodepool master: Add cleanup routine to delete empty nodes https://review.openstack.org/622616 | 05:47 |
| *** dmellado has joined #zuul | 05:48 | |
| *** gouthamr has joined #zuul | 05:53 | |
| *** dmellado has quit IRC | 05:55 | |
| *** dmellado has joined #zuul | 05:57 | |
| *** dmellado has quit IRC | 06:02 | |
| *** dmellado has joined #zuul | 06:14 | |
| *** gouthamr has quit IRC | 06:15 | |
| *** gouthamr has joined #zuul | 06:18 | |
| *** njohnston has quit IRC | 06:27 | |
| *** njohnston_ has joined #zuul | 06:28 | |
| *** gouthamr has quit IRC | 06:29 | |
| *** gouthamr has joined #zuul | 06:32 | |
| *** njohnston_ has quit IRC | 06:52 | |
| *** njohnston has joined #zuul | 06:55 | |
| *** gouthamr has quit IRC | 06:58 | |
| *** gouthamr has joined #zuul | 07:07 | |
| *** bhavikdbavishi has quit IRC | 07:09 | |
| *** bhavikdbavishi1 has joined #zuul | 07:09 | |
| *** pcaruana has joined #zuul | 07:10 | |
| *** bhavikdbavishi1 is now known as bhavikdbavishi | 07:12 | |
| quiquell|off | tobiash: What a brain fart my review :-( | 07:16 |
| openstackgerrit | Quique Llorente proposed openstack-infra/zuul master: Add default value for relative_priority https://review.openstack.org/622175 | 07:17 |
| *** quiquell|off is now known as quiquell | 07:17 | |
| tobiash | quiquell: thanks | 07:17 |
| quiquell | tobiash: fixed | 07:18 |
| quiquell | thanks to you | 07:18 |
| *** gouthamr has quit IRC | 07:40 | |
| *** gouthamr has joined #zuul | 07:42 | |
| *** quiquell is now known as quiquell|brb | 07:48 | |
| *** gtema has joined #zuul | 08:01 | |
| *** themroc has joined #zuul | 08:15 | |
| *** quiquell|brb is now known as quiquell | 08:22 | |
| *** bhavikdbavishi has quit IRC | 08:33 | |
| *** jpena|off is now known as jpena | 08:39 | |
| openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Set type for error'ed instances https://review.openstack.org/622101 | 08:44 |
| tobiash | tristanC, corvus, Shrews: ^ | 08:44 |
| tobiash | there will be a second change that makes the quotacalculation resilient about this so that already wedged nodepools don't have to manually delete broken znodes | 08:45 |
| *** bhavikdbavishi has joined #zuul | 09:06 | |
| *** bhavikdbavishi has quit IRC | 09:07 | |
| *** bhavikdbavishi has joined #zuul | 09:07 | |
| *** njohnston has quit IRC | 09:09 | |
| openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Set type for error'ed instances https://review.openstack.org/622101 | 09:30 |
| openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient https://review.openstack.org/622906 | 09:30 |
| openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient https://review.openstack.org/622906 | 09:32 |
| *** bhavikdbavishi has quit IRC | 09:40 | |
| *** njohnston has joined #zuul | 09:44 | |
| *** quiquell is now known as quiquell|brb | 10:21 | |
| openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: update status page layout based on screen size https://review.openstack.org/622010 | 10:22 |
| *** dkehn has quit IRC | 10:22 | |
| *** quiquell|brb is now known as quiquell | 10:52 | |
| bjackman | Any chance of a Workflow+1 on https://review.openstack.org/#/c/620838/ ? | 10:56 |
| *** bhavikdbavishi has joined #zuul | 10:58 | |
| quiquell | tobiash, corvus: The role fetch-zuul-cloner is legacy stuff ? | 11:09 |
| *** bhavikdbavishi has quit IRC | 11:09 | |
| *** bhavikdbavishi has joined #zuul | 11:09 | |
| *** bhavikdbavishi has quit IRC | 11:16 | |
| *** bhavikdbavishi has joined #zuul | 11:16 | |
| *** bhavikdbavishi has quit IRC | 11:20 | |
| tristanC | quiquell: zuul-cloner is no longer needed in zuulv3, the repos are pushed from the executor instead | 11:24 |
| quiquell | tristanC: So if we have something that depends on that we have to fix it, is that right ? | 11:26 |
| tristanC | quiquell: iiuc, the fetch-zuul-cloner let you not fix that, but it's recommended to use the new zuul.projects workspace yes | 11:28 |
| quiquell | tristanC: ack, thanks! | 11:29 |
| tobiash | quiquell: yes, that's legacy ;) | 11:29 |
| quiquell | sshnaidm: ^ looks like we cannot depend on the setup install fetch-zuul-cloner does | 11:29 |
| quiquell | tobiash: thanks, we have being using the install it does over python projects without knowing | 11:30 |
| tobiash | quiquell: you still can use it but it's only there for backwards compatibility and mainly used in openstack land | 11:30 |
| quiquell | tobiash, tristanC: broken requirements to using /home/zuul/src... | 11:30 |
| quiquell | cool cool thanks | 11:31 |
| sshnaidm | quiquell, the patch with requirement is in gates, so should be fine | 11:31 |
| quiquell | sshnaidm: Just thinking about removing it from the base of our repro, so we can catch those issues | 11:32 |
| quiquell | sshnaidm: Like the patch do we have to also include all the projects there ? | 11:33 |
| sshnaidm | quiquell, well, it's not really the issue while it's working | 11:33 |
| quiquell | sshnaidm: ack | 11:33 |
| *** gtema has quit IRC | 11:40 | |
| *** bhavikdbavishi has joined #zuul | 12:02 | |
| openstackgerrit | Merged openstack-infra/zuul master: Fix "reverse" Depends-On detection with new Gerrit URL schema https://review.openstack.org/620838 | 12:04 |
| openstackgerrit | Brendan proposed openstack-infra/zuul master: Fix urllib imports in Gerrit HTTP form auth code https://review.openstack.org/622942 | 12:05 |
| *** gtema has joined #zuul | 12:14 | |
| *** jpena is now known as jpena|lunch | 12:25 | |
| *** bjackman has quit IRC | 12:52 | |
| *** themroc has quit IRC | 13:21 | |
| *** themroc has joined #zuul | 13:23 | |
| *** jpena|lunch is now known as jpena | 13:33 | |
| *** dkehn has joined #zuul | 13:35 | |
| *** rlandy has joined #zuul | 13:36 | |
| mordred | tristanC: refactor stack lgtm - there's an oops in the commit mesage in https://review.openstack.org/#/c/621396 | 14:02 |
| *** sshnaidm has quit IRC | 14:06 | |
| *** njohnston has quit IRC | 14:08 | |
| *** njohnston_ has joined #zuul | 14:09 | |
| Shrews | pabelanger: re: image build, not sure what zk info would have been lost. i'd have to go back through the code again | 14:09 |
| pabelanger | Shrews: ack, thanks. For now, I've bumped the job timeout to account for the slow nodes, but might be a good optimization to look into. | 14:10 |
| *** quiquell is now known as quiquell|off | 14:11 | |
| Shrews | pabelanger: oh, i think it's because we no longer hold a lock for building that particular image (another builder could be actively building it now) | 14:11 |
| Shrews | pabelanger: not so efficient if you only have one builder with that image, but no way for us to know that atm | 14:12 |
| *** sshnaidm has joined #zuul | 14:14 | |
| pabelanger | Shrews: oh, right. that makes sense | 14:15 |
| tobias-urdin | anybody running zuul with gerrit 2.16, planning upgrade but can't see any supported versions statement | 14:31 |
| pabelanger | tobias-urdin: I've seen 1 fix for gerrit 2.16 in zuul so far: https://review.openstack.org/619533/ | 14:41 |
| pabelanger | I figure that user is, but do not know the irc handle | 14:42 |
| tobias-urdin | pabelanger: thanks, i'll wait for a while and stop at 2.15 for now, 2.16 seems pretty fresh overall | 14:43 |
| pabelanger | tobias-urdin: I know openstack has plans to upgrade, but not sure the timeline. I know corvus is also working on gerrit for opendev, so assume that will maybe use a newer version of gerrit. | 14:45 |
| pabelanger | tobias-urdin: we should see what version of gerrit zuul quickstart is using | 14:45 |
| pabelanger | https://git.zuul-ci.org/cgit/zuul/tree/doc/source/admin/examples/docker-compose.yaml#n6 | 14:46 |
| pabelanger | seems like latest version | 14:46 |
| tobias-urdin | cool | 14:47 |
| tobias-urdin | nobody remembers a coward :) | 14:48 |
| tobias-urdin | i'll give it a try | 14:48 |
| pabelanger | ++ | 14:49 |
| *** sshnaidm has quit IRC | 14:57 | |
| *** quiquell|off has quit IRC | 14:59 | |
| *** ParsectiX has joined #zuul | 15:16 | |
| *** sshnaidm has joined #zuul | 15:18 | |
| *** sshnaidm is now known as sshnaidm|afk | 15:36 | |
| *** jesusaur has quit IRC | 15:37 | |
| tobiash | tobias-urdin: there is a second change for 2.16 too: https://review.openstack.org/620838 | 15:41 |
| tobiash | tobias-urdin: jackman uses 2.16 so you could ask him how good it works atm | 15:41 |
| tobiash | s/jackman/bjackman | 15:41 |
| *** jesusaur has joined #zuul | 15:42 | |
| tobias-urdin | just saw some plugins we are using haven't cut any stable-2.16 branches yet so i'll have to wait on 2.15 for a while :( | 15:51 |
| *** hashar has joined #zuul | 16:04 | |
| tobiash | fungi: I approved 554352 but it is in merge conflict | 16:09 |
| fungi | tobiash: unsurprising. that change has been waiting for ages. i'll rebase it and fix up whatever conflicts it has nowish | 16:11 |
| openstackgerrit | Jeremy Stanley proposed openstack-infra/zuul master: Add instructions for reporting vulnerabilities https://review.openstack.org/554352 | 16:12 |
| fungi | tobiash: ^ not bad. it was just the index, unsurprisingly | 16:12 |
| fungi | thanks! | 16:14 |
| tobiash | now we just need more contact persons... | 16:14 |
| openstackgerrit | Merged openstack-infra/nodepool master: Set type for error'ed instances https://review.openstack.org/622101 | 16:14 |
| openstackgerrit | Merged openstack-infra/nodepool master: Make estimatedNodepoolQuotaUsed more resilient https://review.openstack.org/622906 | 16:14 |
| *** j^2 has joined #zuul | 16:15 | |
| *** sshnaidm|afk has quit IRC | 16:15 | |
| *** pcaruana has quit IRC | 16:18 | |
| tobiash | tristanC: ^ | 16:19 |
| tobiash | with tests :) | 16:19 |
| tobiash | Shrews, corvus: were these the last fixes needed for 3.3.1 ^^ ? | 16:20 |
| tobiash | maybe it makes sense to do a release probably next week? | 16:21 |
| Shrews | tobiash: that's proabably mostly your call since you've worked more with tristanC on his issues | 16:22 |
| Shrews | i'm not aware of anything outstanding atm | 16:22 |
| corvus | we'd probably have the same issue if we restarted more :) | 16:23 |
| corvus | so i think we should get all that in and restart openstack-infra, then when it looks good release | 16:24 |
| tobiash | ++ | 16:24 |
| *** sshnaidm|afk has joined #zuul | 16:30 | |
| pabelanger | would be great to see 3.3.1 release | 16:32 |
| *** themroc has quit IRC | 16:37 | |
| *** gtema has quit IRC | 16:49 | |
| *** bjackman has joined #zuul | 16:57 | |
| *** bhavikdbavishi has quit IRC | 17:02 | |
| *** bhavikdbavishi has joined #zuul | 17:02 | |
| *** hashar has quit IRC | 17:08 | |
| *** bjackman has quit IRC | 17:09 | |
| *** rlandy is now known as rlandy|brb | 17:10 | |
| tobiash | corvus, Shrews: as far I can see all recent fixes to the last release have been merged | 17:25 |
| openstackgerrit | Merged openstack-infra/zuul master: Add instructions for reporting vulnerabilities https://review.openstack.org/554352 | 17:25 |
| *** rlandy|brb is now known as rlandy | 17:42 | |
| *** jpena is now known as jpena|off | 17:42 | |
| *** mrhillsman has quit IRC | 17:51 | |
| *** mrhillsman has joined #zuul | 17:52 | |
| mrhillsman | any thoughts on why web would only show Loading... on the Builds tab? | 18:25 |
| tobiash | mrhillsman: typically if the api requests failed | 18:26 |
| tobiash | mrhillsman: you should check requests in the browser debugging window | 18:26 |
| tobiash | mrhillsman: does zuul-web have the sql connection configured? | 18:27 |
| mrhillsman | let me confirm that | 18:27 |
| mrhillsman | did not know that was required | 18:27 |
| mrhillsman | but makes sense | 18:28 |
| tobiash | mrhillsman: zuul-web directly queries the database without asking the scheduler | 18:28 |
| mrhillsman | thx | 18:32 |
| mrhillsman | tobiash which daemon handles writing to the db | 18:41 |
| tobiash | mrhillsman: the scheduler, but only if the sql reporter is added to the pipeline | 18:41 |
| mrhillsman | yeah, it is there, and i can login manually | 18:42 |
| mrhillsman | and pymysql is there | 18:42 |
| mrhillsman | i did not have mysql-client thought | 18:42 |
| mrhillsman | though | 18:42 |
| mrhillsman | not sure if that mattered | 18:43 |
| Shrews | corvus: tobiash: looks like we successfully removed around 430 empty zk nodes with the latest update | 18:43 |
| tobiash | Shrews: cool :) | 18:43 |
| mrhillsman | oh, i got it | 18:43 |
| mrhillsman | thx tobiash | 18:43 |
| mrhillsman | i have found the error of my ways | 18:43 |
| tobiash | mrhillsman: you're welcome | 18:44 |
| *** manjeets_ is now known as manjeets | 18:47 | |
| *** ParsectiX has quit IRC | 18:49 | |
| openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Add an upgrade release note for schema change https://review.openstack.org/623046 | 18:53 |
| Shrews | corvus: tobiash: ^^^ | 18:53 |
| tobiash | +2 | 18:54 |
| *** bhavikdbavishi has quit IRC | 19:15 | |
| panda | jhesketh: ping re: https://review.openstack.org/#/q/topic:freeze_job may I ask you what you expect this will be used ? something like runner --job <job-name> and will run the playbooks locally ? | 19:49 |
| Shrews | So, it seems we have a race in the test_handler_poll_session_expired nodepool unit test. I've seen it fail enough to make me begin to look into it. Not sure where the race is yet, though. | 21:06 |
| Shrews | but was able to reproduce it locally | 21:07 |
| tobiash | Shrews: cool, I also thought about that but don't have an idea either | 21:09 |
| openstackgerrit | Merged openstack-infra/nodepool master: Add an upgrade release note for schema change https://review.openstack.org/623046 | 21:17 |
| *** rlandy is now known as rlandy|bbl | 23:25 | |
| *** j^2 has quit IRC | 23:26 | |
| *** j^2 has joined #zuul | 23:31 | |
| *** dkehn has quit IRC | 23:33 | |
| *** dkehn has joined #zuul | 23:34 | |
| *** j^2 has quit IRC | 23:47 | |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!