*** ykarel__ is now known as ykarel | 10:37 | |
opendevreview | Aurelio Jargas proposed zuul/zuul-jobs master: Add ensure-poetry role https://review.opendev.org/c/zuul/zuul-jobs/+/922286 | 13:54 |
---|---|---|
opendevreview | Monty Taylor proposed zuul/zuul-jobs master: Hook poetry into ensure-python and build-python-release https://review.opendev.org/c/zuul/zuul-jobs/+/923094 | 15:00 |
opendevreview | Monty Taylor proposed zuul/zuul-jobs master: Add ensure-poetry role https://review.opendev.org/c/zuul/zuul-jobs/+/922286 | 15:02 |
opendevreview | Monty Taylor proposed zuul/zuul-jobs master: Hook poetry into ensure-python and build-python-release https://review.opendev.org/c/zuul/zuul-jobs/+/923094 | 15:02 |
jrosser | i have one particular job using ubuntu-jammy-32GB, and when things get busy it ends up with NODE_FAILURE. I saw this once before and now it's happening again today https://zuul.opendev.org/t/openstack/build/d72396ac4299433a9b2d63a31cc564c8 | 15:21 |
clarkb | jrosser: yes only one cloud is currently offering that node type iirc and if it can't boot the node you get a node failure | 15:22 |
clarkb | the nested virt labels have the same sort of limitation and why we encourage people to use them sparingly | 15:22 |
jrosser | would it not wait for a node? | 15:22 |
clarkb | jrosser: the way nodepool works in our configuration it will attempt to boot the node three times in a cloud before moving to another cloud. If all clouds fail to boot the node you get node failure. With this node type only one cloud can boot the label so only three attempts are made. Nodepool will only wait if the cloud is accurately providing quota information back to nodepool | 15:24 |
clarkb | indicating that there isn't enough quota remaining. However clouds can fail to boot nodes for a variety of reasons | 15:24 |
clarkb | no valid host found is a common one where clouds are near enough to capacity and quota isn't conservative enough | 15:24 |
jrosser | clarkb: might this be one of those cases where the potential number of nodes and the quota are mismatched? | 15:30 |
clarkb | jrosser: yes potentially. It could be the cloud simply has less capacity today than it did in the past. I would have to look at logs to get a better sense of this though. Its also possible teh cloud is flaky at booting nodes (though I suspect that isn't the case) | 15:31 |
fungi | keep in mind we're running flat out at capacity in nodepool for normal node types at the moment due to fixes for a very large/complicated openstack security advisory, not sure if that might be related | 15:32 |
fungi | https://grafana.opendev.org/d/21a6e53ea4/zuul-status | 15:33 |
clarkb | fungi: I don't think the cloud providing the large nodes provides any of the regular flavors | 15:33 |
clarkb | it isn't likely to be related if that is true | 15:33 |
fungi | yeah, that's why i was unsure | 15:33 |
fungi | unless there's some interaction on the nodepool launcher side breaking requests to other providers, which seems unlikely as well | 15:34 |
jrosser | grafana suggests that vexxhost-ca-ymq-1 as a max nodes of 72 but in practice it never exceeds 50 | 15:37 |
fungi | that can also be in part due to mixing and matching of different memory sizes for nodes if we're doing 16g and 32g there (i don't recall) | 15:37 |
fungi | so there might be memory quota for 72x 16g nodes but not 72x 32g | 15:38 |
clarkb | ya the main thing to check would be if the node failruse are a result of no valid host found errors or similar | 15:39 |
clarkb | if so we can probably dial back the max servers value | 15:39 |
fungi | in situations like that nodepool relies on the quota information the cloud supplies to determine available capacity for satisfying requests, and the max-servers is a sort of backstop | 15:39 |
clarkb | I'm not in a good spot for that right htis moment. I have to finish some local patching for regresshion and then have code reviews I've promised | 15:39 |
clarkb | fungi: frickler: https://review.opendev.org/c/opendev/git-review/+/920845 should we go ahead and land that one? | 16:02 |
fungi | i think so, assuming frickler was good with my answer | 16:03 |
frickler | ah, I missed that answer, approved after unmangling what gerrit UI made out of the __init__ ;) | 16:12 |
fungi | yeah, it's not super straightforward and could certainly benefit from some refactoring for clarity | 16:13 |
opendevreview | Merged opendev/git-review master: Update the upper bound for Python and Gerrit tests https://review.opendev.org/c/opendev/git-review/+/920845 | 16:49 |
opendevreview | Merged zuul/zuul-jobs master: Add ensure-poetry role https://review.opendev.org/c/zuul/zuul-jobs/+/922286 | 17:39 |
fungi | i manually enqueued and promoted a couple of bug fix changes in the gate pipeline which are blocking openstack security vulnerability fixes, just a heads up | 19:20 |
fungi | testing fix changes specifically | 19:20 |
clarkb | I'm going to be afk for a bit (popping out for lunch and then probably a hair cut) | 20:01 |
clarkb | I don't quite have fungi hair but it is getting too long | 20:02 |
tonyb | fungi: If you could look at the held wiki node for various anti-SPAM extensions that'd be helpful | 20:03 |
tonyb | tony@thor:~$ grep wiki99 /etc/hosts | 20:03 |
tonyb | 104.239.143.6wiki99.opendev.org | 20:03 |
tonyb | Any testing hints would be great so I don't need to lean so heavily on you | 20:04 |
fungi | is the vhost serving as wiki.opendev.org or just wiki99? | 20:06 |
fungi | looks like it's redirecting to wiki99 | 20:09 |
tonyb | For testing it's wiki99 | 20:09 |
fungi | k | 20:10 |
fungi | looks like my login is working at least | 20:10 |
tonyb | Nice | 20:10 |
tonyb | I assume that's OpenID | 20:10 |
fungi | yeah | 20:11 |
fungi | looks like https://wiki99.opendev.org/w/index.php?title=Special:RecentChanges&limit=500&hidepatrolled=1&days=30 is probably working, but i'll need to log in as a separate unverified user and test making some edits, which will need to wait until the openstack vulnerability stuff quiets down a bit more | 20:13 |
fungi | i'll also test mass deletion when i do that, which is the other half of the workflow for dealing with spammy/vandal edits | 20:13 |
tonyb | Sounds good. So SPAM looks like newuser is created (which also means creating an ubuntu-one account?) The new user posts a lot of stuff (possibly to their :talk page) | 20:16 |
tonyb | You see that content in the link above and then mass delete everything they did and ban/block the account? | 20:16 |
tonyb | But Also somewhere in there ... in the case the newuser is legit you'd instead add them to the autopatrolled group to filter them from further inspection? | 20:18 |
tonyb | If I'm more or less on the right track you can add my user (or I can create a new specific one) to help manage the spam on the wiki | 20:19 |
fungi | that is precisely the workflow, yes | 20:22 |
fungi | there is also a grey area where their first edits might not be convincing so i leave them patrolled and then the next time they make edits i look back at their edit history and see they're a returning user | 20:23 |
fungi | rather i leave the new user unverified but not blocked and i mark their initial edits as patrolled but then check their subsequent edits the next time they make some | 20:24 |
fungi | so the goal is that the query i linked above normally returns no results (or occasionally references to deleted pages which were spam) | 20:24 |
tonyb | Okay I think I understand how that'd work | 20:25 |
fungi | it's easy to visually skip over the already deleted pages because the links in the recent changes report show up red since they go to pages that don't exist | 20:25 |
tonyb | I see a lot of users that I'd expect to be whitelisted (which is what I thought added the account to autopatrolled would do) in the list you linked to | 20:26 |
tonyb | is that normal or could it be a byproduct of something not migrating across | 20:27 |
tonyb | Well the same link for wiki.openstack looks approximately correct | 20:28 |
fungi | https://wiki99.opendev.org/w/index.php?title=Special%3AListUsers&username=&group=autopatrol&limit=50 is the verified users | 20:28 |
fungi | the group name is autopatrol (as in their edits won't be manually patrolled by a moderator) | 20:29 |
fungi | looks like there are a little over a thousand users in there | 20:30 |
fungi | in that group i mean | 20:30 |
tonyb | and my account TonyBreeds is in that list, but my edits are also in the moderation list you linked | 20:31 |
fungi | oh, i think you need to be in a more privileged group to see the filtering correctly | 20:32 |
tonyb | Ah okay. | 20:32 |
tonyb | That's helpful | 20:33 |
tonyb | I feel like I've hijacked your time which I didn't mean to do. | 20:33 |
fungi | i've added you to administrator and bureaucrat groups on wiki99 if you want to refresh the quert | 20:33 |
fungi | query | 20:33 |
fungi | hopefully the page should be empty now | 20:33 |
tonyb | and indeed it is! | 20:35 |
tonyb | and I have a "Show patrolled edits" filter that wasn't there before | 20:35 |
fungi | yeah, i think that's limited to bureaucrats or admins which is why it wasn't filtering them out for you before | 20:39 |
fungi | it's also what i see normally on the old server if my login has expired | 20:39 |
tonyb | got it | 20:39 |
opendevreview | Tony Breeds proposed opendev/system-config master: [DNM] Run ansible-devel under python-3.11 https://review.opendev.org/c/opendev/system-config/+/922704 | 21:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!