Sunday, 2022-06-05

noonedeadpunkfungi: yeah, also commuting. But you can check this as example
noonedeadpunkofc I can assume these are some of our jobs that fail in a way that no logs are posted.... but well...16:24
noonedeadpunkNo way to check that..16:24
fungiyeah, at first look it seems that the log upload task is taking more than 30 minutes to copy them to swift, and is being killed16:25
fungibut i've not had more time to dig deeper16:25
noonedeadpunkaha, ok, yes, now I scrolled up enough :)16:25
fungiit could be something has caused the logs to get much bigger for those jobs in the past couple of days, for example16:26
noonedeadpunkcan you create a hold for openstack-ansible-deploy-aio_lxc-ubuntu-focal job? it looks like failing the most16:27
noonedeadpunkas we haven;t changed anything directly in log collection 16:28
noonedeadpunkbut for this job we have several extra containers, which can result in more files. But shouldn't be nothing really critical...16:28
fungion my way out to dinner with some other folks, but can try when i get back if nobody beats me to it16:29
noonedeadpunkI guess we should also limit logs that are collected in case of success...16:29
noonedeadpunksure, sorry :)16:29
noonedeadpunkwhat;s interesting that in zuul console I catched `"msg": "Failed to connect to the host via ssh: zuul@ Permission denied (publickey).",`  for [get df disk usage] task. But likely that's because of upload failure...19:12
funginoonedeadpunk: yeah, i noticed that earlier and mentioned it to clarkb... normally we try to do a df on the remote nodes in cases where a failure might be related to the rootfs filling up, but something about that raw task has broken recently. it should be unrelated, the earlier swift upload task timeout is the origin of the post_failure result22:14
fungii've set the requested autohold22:46

Generated by 2.17.3 by Marius Gedminas - find it at!