*** jamesmcarthur has joined #openstack-infra | 00:00 | |
*** Lucas_Gray has quit IRC | 00:09 | |
*** Lucas_Gray has joined #openstack-infra | 00:12 | |
*** jamesmcarthur_ has joined #openstack-infra | 00:18 | |
*** hamalq_ has quit IRC | 00:20 | |
*** jamesmca_ has joined #openstack-infra | 00:21 | |
*** jamesmcarthur has quit IRC | 00:22 | |
*** jamesmcarthur_ has quit IRC | 00:23 | |
*** jamesmca_ has quit IRC | 00:24 | |
*** tetsuro has joined #openstack-infra | 00:25 | |
*** yamamoto has joined #openstack-infra | 00:36 | |
*** ricolin has joined #openstack-infra | 00:37 | |
*** jamesmcarthur has joined #openstack-infra | 00:39 | |
*** yamamoto has quit IRC | 00:41 | |
*** jamesmcarthur has quit IRC | 00:42 | |
*** jamesden_ has joined #openstack-infra | 00:42 | |
*** ricolin has quit IRC | 00:42 | |
*** jamesden_ has quit IRC | 00:42 | |
*** jamesden_ has joined #openstack-infra | 00:43 | |
*** jamesdenton has quit IRC | 00:43 | |
*** jamesmcarthur has joined #openstack-infra | 00:44 | |
*** armax has quit IRC | 00:50 | |
*** markvoelker has joined #openstack-infra | 00:54 | |
*** ociuhandu has joined #openstack-infra | 00:56 | |
*** armax has joined #openstack-infra | 00:58 | |
*** ociuhandu has quit IRC | 00:59 | |
*** markvoelker has quit IRC | 00:59 | |
*** ociuhandu has joined #openstack-infra | 00:59 | |
*** ociuhandu has quit IRC | 01:03 | |
*** markvoelker has joined #openstack-infra | 01:09 | |
*** yamamoto has joined #openstack-infra | 01:12 | |
*** markvoelker has quit IRC | 01:14 | |
*** yamamoto has quit IRC | 01:38 | |
*** ricolin has joined #openstack-infra | 01:46 | |
*** jamesmcarthur has quit IRC | 01:58 | |
*** jamesmcarthur has joined #openstack-infra | 01:59 | |
*** rlandy|ruck|bbl is now known as rlandy|ruck | 02:03 | |
*** Lucas_Gray has quit IRC | 02:04 | |
*** jamesmcarthur has quit IRC | 02:04 | |
*** rfolco has quit IRC | 02:09 | |
*** rlandy|ruck has quit IRC | 02:21 | |
*** vishalmanchanda has joined #openstack-infra | 02:29 | |
*** jamesmcarthur has joined #openstack-infra | 02:32 | |
*** yamamoto has joined #openstack-infra | 02:42 | |
*** jamesmcarthur has quit IRC | 02:42 | |
*** yamamoto has quit IRC | 02:43 | |
*** yamamoto has joined #openstack-infra | 02:43 | |
*** hongbin has joined #openstack-infra | 02:59 | |
*** ricolin has quit IRC | 03:02 | |
*** yolanda has quit IRC | 03:02 | |
*** yolanda has joined #openstack-infra | 03:03 | |
*** smarcet has quit IRC | 03:07 | |
*** jamesmcarthur has joined #openstack-infra | 03:17 | |
*** jamesmcarthur has quit IRC | 03:22 | |
*** psachin has joined #openstack-infra | 03:28 | |
*** hongbin has quit IRC | 03:33 | |
*** jamesmcarthur has joined #openstack-infra | 03:50 | |
*** armax has quit IRC | 04:05 | |
*** ykarel|away is now known as ykarel | 04:27 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #openstack-infra | 04:33 | |
*** ysandeep|away is now known as ysandeep | 04:40 | |
*** jamesmcarthur has quit IRC | 04:52 | |
*** jamesmcarthur has joined #openstack-infra | 04:52 | |
*** jtomasek has joined #openstack-infra | 04:56 | |
*** jtomasek has quit IRC | 04:56 | |
*** jtomasek has joined #openstack-infra | 04:57 | |
*** jamesmcarthur has quit IRC | 04:58 | |
*** matt_kosut has joined #openstack-infra | 05:00 | |
AJaeger | config-core, could you review https://review.opendev.org/737987 and https://review.opendev.org/737995 , please? - further retirement changes | 05:05 |
---|---|---|
*** jamesmcarthur has joined #openstack-infra | 05:26 | |
*** jamesmcarthur has quit IRC | 05:33 | |
*** udesale has joined #openstack-infra | 05:40 | |
*** lmiccini has joined #openstack-infra | 05:45 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/738150 | 06:10 |
*** danpawlik is now known as dpawlik|EoD | 06:15 | |
*** ralonsoh has joined #openstack-infra | 06:17 | |
*** dklyle has quit IRC | 06:20 | |
*** rpittau|afk is now known as rpittau | 06:29 | |
openstackgerrit | Merged openstack/project-config master: Retire networking-onos, openstack-ux, solum-infra-guest-agent: Step 1 https://review.opendev.org/737987 | 06:50 |
*** slaweq has joined #openstack-infra | 06:57 | |
*** ysandeep is now known as ysandeep|afk | 07:04 | |
*** marcosilva has joined #openstack-infra | 07:17 | |
*** jcapitao has joined #openstack-infra | 07:18 | |
*** hashar has joined #openstack-infra | 07:20 | |
*** ysandeep|afk is now known as ysandeep | 07:23 | |
*** bhagyashris|afk is now known as bhagyashris | 07:27 | |
*** amoralej|off is now known as amoralej | 07:31 | |
*** jpena|off is now known as jpena | 07:31 | |
*** jtomasek has quit IRC | 07:32 | |
*** dtantsur|afk is now known as dtantsur | 07:33 | |
*** jtomasek has joined #openstack-infra | 07:35 | |
*** tosky has joined #openstack-infra | 07:35 | |
*** ociuhandu has joined #openstack-infra | 07:37 | |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Remove legacy-tempest-dsvm-networking-onos https://review.opendev.org/737995 | 07:38 |
*** marcosilva has quit IRC | 07:50 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Run openafs promote job only if gate job run https://review.opendev.org/738155 | 07:50 |
*** jtomasek has quit IRC | 08:10 | |
*** jtomasek has joined #openstack-infra | 08:14 | |
*** hashar_ has joined #openstack-infra | 08:21 | |
*** hashar has quit IRC | 08:22 | |
*** hashar_ is now known as hashar | 08:29 | |
*** pkopec has quit IRC | 08:31 | |
*** derekh has joined #openstack-infra | 08:43 | |
*** ykarel is now known as ykarel|lunch | 08:48 | |
*** markvoelker has joined #openstack-infra | 08:49 | |
openstackgerrit | Carlos Goncalves proposed openstack/project-config master: Add nested-virt-centos-8 label https://review.opendev.org/738161 | 08:50 |
*** jistr has quit IRC | 08:53 | |
*** markvoelker has quit IRC | 08:54 | |
*** jistr has joined #openstack-infra | 08:54 | |
*** gfidente has joined #openstack-infra | 09:02 | |
*** kaiokmo has joined #openstack-infra | 09:06 | |
*** ysandeep is now known as ysandeep|lunch | 09:06 | |
*** udesale has quit IRC | 09:07 | |
openstackgerrit | Shivanand Tendulker proposed openstack/project-config master: Removes py35 and py27 jobs for proliantutils https://review.opendev.org/738168 | 09:09 |
*** ramishra has quit IRC | 09:09 | |
*** xek has joined #openstack-infra | 09:10 | |
openstackgerrit | Carlos Goncalves proposed openstack/project-config master: Add nested-virt-centos-8 label https://review.opendev.org/738161 | 09:13 |
openstackgerrit | Shivanand Tendulker proposed openstack/project-config master: Removes py35 and py27 jobs for proliantutils https://review.opendev.org/738168 | 09:14 |
*** udesale has joined #openstack-infra | 09:15 | |
*** pkopec has joined #openstack-infra | 09:17 | |
*** ysandeep|lunch is now known as ysandeep | 09:22 | |
*** eolivare has joined #openstack-infra | 09:26 | |
*** Lucas_Gray has joined #openstack-infra | 09:41 | |
*** ramishra has joined #openstack-infra | 09:52 | |
*** ykarel|lunch is now known as ykarel | 09:55 | |
*** priteau has joined #openstack-infra | 10:03 | |
*** rpittau is now known as rpittau|bbl | 10:04 | |
*** tetsuro has quit IRC | 10:08 | |
*** pkopec has quit IRC | 10:09 | |
*** jcapitao has quit IRC | 10:21 | |
*** jcapitao has joined #openstack-infra | 10:23 | |
*** jcapitao is now known as jcapitao_lunch | 10:34 | |
*** slaweq has quit IRC | 10:40 | |
*** ccamacho has quit IRC | 10:42 | |
*** slaweq has joined #openstack-infra | 10:42 | |
*** markvoelker has joined #openstack-infra | 10:50 | |
*** markvoelker has quit IRC | 10:54 | |
*** Lucas_Gray has quit IRC | 11:14 | |
openstackgerrit | Thierry Carrez proposed zuul/zuul-jobs master: upload-git-mirror: use retries to avoid races https://review.opendev.org/738187 | 11:21 |
*** jaicaa has quit IRC | 11:23 | |
zbr | what is happening with "Web Listing Disabled" on log servers? | 11:24 |
*** Lucas_Gray has joined #openstack-infra | 11:26 | |
*** jpena is now known as jpena|lunch | 11:33 | |
*** ryohayakawa has quit IRC | 11:35 | |
*** tinwood has quit IRC | 11:37 | |
*** kopecmartin is now known as kopecmartin|pto | 11:37 | |
*** tinwood has joined #openstack-infra | 11:38 | |
*** jaicaa has joined #openstack-infra | 11:49 | |
frickler | dirk: cmurphy: would one of you be interested in fixing opensuse for stable/stein in devstack? see https://review.opendev.org/735640 , the other option would be to just drop that job until someone cares or has time | 12:01 |
dirk | frickler: iirc AJaeger was looking at it | 12:04 |
dirk | there has been a short conversation about it | 12:04 |
dirk | frickler: I'll poke that you'll get a colleague looking at it | 12:04 |
AJaeger | dirk: I was looking and failed ;( | 12:07 |
*** jcapitao_lunch is now known as jcapitao | 12:07 | |
AJaeger | dirk: so, we were able to fix train but stein is a different beast | 12:08 |
dirk | AJaeger: ok, I'll ask internally further, thanks | 12:08 |
AJaeger | thanks, dirk! | 12:10 |
*** rpittau|bbl is now known as rpittau | 12:12 | |
*** rlandy has joined #openstack-infra | 12:13 | |
*** rlandy is now known as rlandy|ruck | 12:13 | |
*** ociuhandu has quit IRC | 12:14 | |
*** rfolco has joined #openstack-infra | 12:18 | |
*** udesale has quit IRC | 12:29 | |
*** derekh has quit IRC | 12:32 | |
*** ociuhandu has joined #openstack-infra | 12:34 | |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 12:35 | |
*** smarcet has joined #openstack-infra | 12:37 | |
fungi | zbr: you'll have to be more specific, though that usually is an indication that no index was uploaded for the url you're visiting (possibly nothing at all). have an example build which links to a listing error? | 12:43 |
fungi | zbr: also possible you're following an old link and the logs have already expired and been deleted? | 12:44 |
*** jpena|lunch is now known as jpena | 12:44 | |
zbr | fungi: i did a recheck, my guess is that current retention is too small. sometimes we need logs available for long time before we make a decision | 12:45 |
zbr | also, i observed that updating the commit message on my controversial wrap/unwrap change reset votes, so we cannot rely on gerrit to track support for that change. | 12:46 |
fungi | yep, however we generate something like 2-3tb of compressed logs every month | 12:46 |
zbr | any idea what we can use to track it? | 12:46 |
fungi | last i looked anyway | 12:46 |
zbr | probably we need different rules based on project or based on size of logs per job | 12:46 |
zbr | why to scrap jobs that produce low logs due to ones that are heavy on them? | 12:47 |
fungi | i'm not sure i want to be in the position of deciding what project is more important than what other project and who deserves to be able to monopolize our ci log storage | 12:47 |
zbr | probably a rule based on size would be unbiased | 12:47 |
zbr | any log > X is scrapped at 30 days, any log > Y is scrapped at 3 months. | 12:48 |
fungi | that might be doable, expiration is decided at upload time, though it's also much easier to communicate a single retention period | 12:48 |
zbr | a rotating rule would make more sense to me, start to scrap old stuff, instead of guessing at upload time | 12:50 |
zbr | i am not sure we can always make an informed decision about removal date when we upload, yep we should have a default. | 12:51 |
fungi | well, the expiration is a swift feature. when we had a log server we used to have a running process go through all the old logs to decide what to get rid of, and turns out the amount of logs we're keeping is so large that if you run something like that continuously you still can't keep up with the upload rate | 12:51 |
zbr | can we compute total log size before uploading them? | 12:51 |
*** markvoelker has joined #openstack-infra | 12:51 | |
zbr | if so, we could make the expiration bit longer for small logs. | 12:51 |
zbr | this could play well with less active projects too | 12:52 |
fungi | that's why i was saying it might be possible since in theory we know the aggregate log quantity for any build | 12:52 |
zbr | and avoid extra "rechecks" | 12:52 |
zbr | lets see what others think about it | 12:52 |
fungi | though based on previous numbers, tripleo would wind up with a one-week expiration for most of its job logs ;) | 12:52 |
zbr | fungi: (me hiding) | 12:53 |
fungi | we expire lots of smaller logs to make room for those | 12:53 |
fungi | currently | 12:53 |
fungi | but yeah, i don't know what the real upshot would be, or whether we could just increase our retention period, across the board depending on what our overall utilization and donated object storage quotas look like | 12:54 |
*** markvoelker has quit IRC | 12:56 | |
*** jamesden_ is now known as jamesdenton | 12:59 | |
*** amoralej is now known as amoralej|lunch | 13:04 | |
*** derekh has joined #openstack-infra | 13:04 | |
*** gfidente has quit IRC | 13:06 | |
*** ykarel is now known as ykarel|afk | 13:14 | |
*** smarcet has quit IRC | 13:16 | |
*** gfidente has joined #openstack-infra | 13:16 | |
*** dtantsur is now known as dtantsur|afk | 13:21 | |
mwhahaha | hey i'm trying to look into the RETRY_LIMIT crashing issue but I don't seem to be able to figure out when it might have started. Is there a way to get more history out of zuul or some other tool? | 13:21 |
mwhahaha | I dont' seem to be able to find them in logstash (probably cause no logs) | 13:22 |
mwhahaha | I have a feeling it's a bug in centos8 because it's occuring on different jobs/branches but I don't know when it started | 13:24 |
EmilienM | infra-root: we're having gate issue and we have a mitigation to reduce the failures with a revert: https://review.opendev.org/#/c/738025/ - would it be possible to forsh push that patch please? | 13:28 |
*** smarcet has joined #openstack-infra | 13:29 | |
rlandy|ruck|mtg | mwhahaha: we had a discussion about the RETRY_LIMIT on wednesday - if it's the same issue | 13:30 |
mwhahaha | yea it's that one | 13:30 |
mwhahaha | i think it's happening when we run container image prepare which relies on multiprocessing because it seems to be happening ~30-40 mins into the job consistently | 13:30 |
AJaeger | EmilienM: you mean: promote to head of queue? | 13:30 |
rlandy|ruck|mtg | we got as far as finding out that the failure hits in tripleo-ci/toci_gate_test.sh | 13:31 |
rlandy|ruck|mtg | and mostly leaves no logs | 13:31 |
rlandy|ruck|mtg | any test running toci can basically hit it | 13:31 |
mwhahaha | rlandy|ruck|mtg: I don't think it's the shell script tho, it's more likely what we're running inside it | 13:31 |
mwhahaha | that's just our entry point into quickstart | 13:31 |
rlandy|ruck|mtg | we didn't trace it back to any particular provider | 13:31 |
mwhahaha | yea i think it's a centos8 bug | 13:32 |
mwhahaha | because i think it started about the time we got 8.2 | 13:32 |
mwhahaha | https://zuul.opendev.org/t/openstack/builds?result=RETRY_LIMIT&project=openstack%2Ftripleo-heat-templates | 13:32 |
mwhahaha | points to something durring the job | 13:32 |
EmilienM | AJaeger: no, force merge | 13:32 |
mwhahaha | rlandy|ruck|mtg: the timing indicates it's during the standalone deploy itself because we start it ~30 mins into a job | 13:33 |
rlandy|ruck|mtg | AJaeger: here is the related bug ... https://bugs.launchpad.net/tripleo/+bug/1885279 | 13:33 |
openstack | Launchpad bug 1885279 in tripleo "TestVolumeBootPattern.test_volume_boot_pattern tests on master are failing on updating to cirros-0.5.1 image" [Critical,In progress] - Assigned to Ronelle Landy (rlandy) | 13:33 |
frickler | rlandy|ruck|mtg: you need > 64MB for cirros 0.5.1. devstack uses 128MB | 13:34 |
*** mordred has quit IRC | 13:34 | |
rlandy|ruck|mtg | chandankumar: ^^ | 13:35 |
rlandy|ruck|mtg | frickler: that may be - but we need more qualification here and we'd like to revert to do that | 13:35 |
chandankumar | rlandy|ruck|mtg, let me check the default size | 13:36 |
AJaeger | EmilienM: I don't have those permissions, just asking. Why do you think a force-merge is needed? | 13:36 |
rlandy|ruck|mtg | AJaeger: it's taking time to get the patch through the gate | 13:36 |
rlandy|ruck|mtg | in the mean time, other jobs are failing | 13:36 |
EmilienM | AJaeger: our CI stats isn't good. We're dealing with multiple issues at this time and we think this one is one of them | 13:37 |
AJaeger | Normally, we just promote them to head of gate to speed up... | 13:37 |
*** dave-mccowan has quit IRC | 13:37 | |
EmilienM | yeah things haven't been normal for us this week :-/ | 13:37 |
AJaeger | bbl | 13:37 |
rlandy|ruck|mtg | I guess we'll take what we can get - top of the queue then - pls | 13:38 |
*** dave-mccowan has joined #openstack-infra | 13:38 | |
chandankumar | rlandy|ruck|mtg, https://opendev.org/osf/python-tempestconf/src/branch/master/config_tempest/constants.py#L30 | 13:38 |
chandankumar | it is 64 | 13:38 |
chandankumar | we need to increase that | 13:38 |
rlandy|ruck|mtg | chandankumar: let's discuss back on our channels | 13:39 |
chandankumar | yes | 13:39 |
corvus | EmilienM, AJaeger, infra-root: hi, i can promote 738025 | 13:40 |
EmilienM | thanks corvus | 13:41 |
corvus | EmilienM: if it does improve things, then of coures changes behind it in the gate will automatically receive the benefit of that improvement; you can make other changes in check benefit from it before it lands by adding a Depends-On | 13:43 |
EmilienM | right | 13:44 |
corvus | it's at the top now | 13:44 |
EmilienM | I saw, thanks a lot | 13:44 |
rlandy|ruck|mtg | corvus: thank you | 13:45 |
corvus | no problem :) hth | 13:45 |
*** udesale has joined #openstack-infra | 13:47 | |
*** amoralej|lunch is now known as amoralej | 13:47 | |
*** jamesmcarthur has joined #openstack-infra | 13:49 | |
rlandy|ruck|mtg | mwhahaha: https://bugs.launchpad.net/tripleo/+bug/1885286 - so we have a place to track the investigation of RETRY_LIMIT errors | 13:53 |
openstack | Launchpad bug 1885286 in tripleo "Increase in RETRY_LIMIT errors in zuul.openstack.org is preventing jobs from passing check/gate" [Critical,Triaged] - Assigned to Ronelle Landy (rlandy) | 13:53 |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 13:54 | |
mwhahaha | rlandy|ruck: yea i think it's happening during container-image-prepare based on the timings ~30-40 mins | 13:54 |
*** yamamoto has quit IRC | 13:54 | |
rlandy|ruck | mwhahaha: so then the failure is later then ... | 13:55 |
mwhahaha | yea | 13:55 |
rlandy|ruck | on Wed we were looking at failure around 1-1 mins | 13:55 |
mwhahaha | https://zuul.opendev.org/t/openstack/builds?result=RETRY_LIMIT&project=openstack%2Ftripleo-heat-templates | 13:55 |
rlandy|ruck | 10-15 | 13:55 |
rlandy|ruck | and no logs | 13:55 |
mwhahaha | check the failures for tripleo jobs | 13:55 |
mwhahaha | i think it's a multiprocessing bug in python in centos8 | 13:55 |
mwhahaha | we've seen stack traces in ansible with it too previously | 13:55 |
mwhahaha | i'll try and dig up my logs later, i asked about it in #ansible-devel liek 2 weeks ago | 13:56 |
*** yamamoto has joined #openstack-infra | 13:56 | |
rlandy|ruck | mwhahaha: k, thanks | 13:56 |
mwhahaha | rlandy|ruck: http://paste.openstack.org/show/794407/ was the ansible crash i saw | 14:03 |
rlandy|ruck | at least we have some trace to go on now | 14:04 |
mwhahaha | may not be related but i saw it shortly after we got 2.9.9 | 14:04 |
mwhahaha | but since the failure is in python itself, i'm wondering if there's another issue | 14:05 |
*** dklyle has joined #openstack-infra | 14:08 | |
fungi | something seems to be breaking in such a way that at least ssh from the executor ceases working... whether that's a kernel panic, network interface getting unconfigured, sshd hanging... can't really tell | 14:09 |
fungi | you could open a bunch of log streams for jobs you think are more likely to hit that condition, and see where they stop | 14:10 |
fungi | (if you do it with finger protocol you could probably record them to separate local files pretty easily, and not have to depend on browser websockets) | 14:10 |
mwhahaha | it doesn't happen enough :/ | 14:11 |
rlandy|ruck | mwhahaha: on that paste the date logged is Jun 05 09:44:15 | 14:11 |
mwhahaha | yea that's not from one of these | 14:11 |
rlandy|ruck | twenty days ago? | 14:11 |
mwhahaha | that's just something i noticed that was happening | 14:11 |
fungi | well, the retry_limit doesn't happen that often, because the job has to hit a similar condition three builds in a row... i imagine isolated instances of this which aren't hit on a second or third rebuild may be much more common | 14:12 |
mwhahaha | where python was segfaulting in the multiprocessing bits in ansible. since container image prepare uses multiprocessing it might be a similar root cause | 14:12 |
rlandy|ruck | it's happening often enough now to impact the rate jobs get through gates | 14:12 |
openstackgerrit | Shivanand Tendulker proposed openstack/project-config master: Removes py35, tox and cover jobs for proliantutils https://review.opendev.org/738168 | 14:22 |
*** armax has joined #openstack-infra | 14:24 | |
*** ykarel|afk is now known as ykarel | 14:37 | |
*** priteau has quit IRC | 14:44 | |
*** priteau has joined #openstack-infra | 14:47 | |
*** markvoelker has joined #openstack-infra | 14:52 | |
*** priteau has quit IRC | 14:52 | |
*** markvoelker has quit IRC | 14:57 | |
clarkb | mwhahaha: rlandy|ruck: may also want to add logging of the individual steps as they happen in that script | 14:57 |
mwhahaha | it's nto that script and we do | 14:57 |
*** lmiccini has quit IRC | 14:57 | |
mwhahaha | but since we don't get any logs we have no idea what's happening | 14:57 |
mwhahaha | that script just invokes other things that do log, but no logs are captured | 14:57 |
clarkb | mwhahaha: I know its not the script but its something the script runs isn't it? and getting that emited to the console log would be useful rather than trying to infer based on time to failure | 14:57 |
clarkb | right I'm saying write the logs to the console and then you'll get them | 14:58 |
mwhahaha | we don't seem to be recording the console anywhere | 14:58 |
clarkb | its available while the job runs | 14:58 |
mwhahaha | which is not helpful | 14:58 |
clarkb | permanent storage requires that the host be available at the end of the job for archival | 14:58 |
clarkb | why is't that helpful? you can start a number of them, open the logs (via browser or finger) wait for one to fail, save logs, debug from there | 14:59 |
mwhahaha | zuul doesn't have a call back to write out the console and always ship that off the executor? | 14:59 |
clarkb | mwhahaha: the console log is from the test node not the executor | 14:59 |
clarkb | if the test node is gone there is no more console log to copy | 14:59 |
openstackgerrit | Merged openstack/project-config master: Removes py35, tox and cover jobs for proliantutils https://review.opendev.org/738168 | 14:59 |
mwhahaha | maybe i'm missing how zuul is invoking ansible on that, but shouldn't there be a way to ship the output off the node w/o needing the node such that it can be captured even if the node dies | 15:00 |
clarkb | no because those logs are on the disk of the node | 15:00 |
* mwhahaha shrugs | 15:00 | |
clarkb | we could potentially set up a hold and keep nodepool from deleting the instance (though I'm not sure that would trigger on a retry failure? that may be a hold bug), then reboot --hard the instance via the nova api and hope it comes back | 15:01 |
mwhahaha | it's not a single job that hits this, it's like any of them. so trying to open up something that can capture all the console output all the time and then figure out which one RETRY_LIMITs isn't as simple as you make it seem | 15:01 |
weshay_pto | there has to be some amount of tracking jobs that hit RETRY_LIMIT right? | 15:02 |
*** jcapitao has quit IRC | 15:02 | |
clarkb | we track it | 15:02 |
clarkb | the problem is in accessing the logs after the fact | 15:02 |
clarkb | mwhahaha: you could run a bunch of fingers with their output tee'd to files | 15:02 |
clarkb | I'm not saying its ideal, but this particular class of failure is difficult to deal with | 15:04 |
fungi | mwhahaha: also you don't need to wait for a retry_limit result, as i keep saying, the retry_limit happens when a particular job fails in a similar way three builds in a row, so the odds there are failures of these builds happening only once or twice in a row is likely much higher, statistically | 15:04 |
mwhahaha | i think the issue is identifying that | 15:04 |
mwhahaha | while it's running | 15:04 |
mwhahaha | anyway i'll look at it later | 15:04 |
clarkb | re holding a node I'm 99% certain we won't hold if the job is retried. Whether or not the 3rd pass failing would trigger the hold I'm not sure | 15:06 |
fungi | and yeah, i also suggested using finger protocol and redirecting it to a file... you could grab a snapshot of the zuul status.json, parse out the list of any running builds which are likely to hit that issue, spawn individual netcats in the background to each of the finger addresses for them and redirect those to local files... later grep those dump files for an indication the job did not succeed | 15:06 |
fungi | (perhaps lacks the success message) and that narrows the pool significantly | 15:06 |
clarkb | hold_list = ["FAILURE", "RETRY_LIMIT", "POST_FAILURE", "TIMED_OUT"] | 15:07 |
clarkb | we would hold the third failure | 15:07 |
clarkb | so that is another option, though relies on a reboot producing a working instance after the fact | 15:08 |
clarkb | I' | 15:08 |
clarkb | er | 15:08 |
clarkb | I'm happy to add a hold if we can give a rough set of criteria for it | 15:08 |
fungi | i doubt the hold would help, because the set of jobs it's impacting is fairly large, and the frequency with which one of those builds hits retry_failure is likely statistical noise compared to other failure modes | 15:09 |
clarkb | I guess even if reboot fails we can ask the cloud for the instance console and that may give clues | 15:10 |
clarkb | fungi: ya, it may require several attempts to get one. I wonde can we tell hold we only want RETRY_LIMIT jobs? | 15:10 |
clarkb | looks like no | 15:11 |
fungi | we do have code we can switch on to grab nova console from build failures right? or was that nodepool launch failures? | 15:11 |
clarkb | fungi: that is nodepool launch failures | 15:11 |
clarkb | but if we get a hold we can manually run it | 15:11 |
fungi | ahh, right, executors lack the credentials for that anyway | 15:11 |
mwhahaha | so i just got a crashed one | 15:15 |
mwhahaha | http://paste.openstack.org/show/795271/ | 15:16 |
mwhahaha | it just disappears | 15:16 |
mwhahaha | no console output | 15:16 |
fungi | that was fast | 15:16 |
mwhahaha | so it's like it crashed | 15:16 |
fungi | yep, that's all i was findnig in the executor debug logs too | 15:16 |
clarkb | mwhahaha: is standalone deploy a task that runs ansible? | 15:17 |
mwhahaha | it's an ansibel task that invokes shell | 15:17 |
mwhahaha | to run python/ansible stuff | 15:17 |
clarkb | is that nested ansible or zuul's top level ansible? | 15:17 |
fungi | and that crash wasn't in the same script i was seeing before | 15:17 |
mwhahaha | doesn't really matter because the node went poof | 15:17 |
mwhahaha | nested | 15:18 |
clarkb | mwhahaha: well what would be potentially useful is figuring out where in that 11 minutes the script breaks | 15:18 |
mwhahaha | zuul -> toci sh -> quickstart -> ansible -> shell -> python -> ansible | 15:18 |
clarkb | hrm nested means we aren't able to get streaming shell out of ansible (at least not easily) | 15:18 |
mwhahaha | i know where it likely breaks based on timeing (as previously mentioned) | 15:18 |
mwhahaha | which is a python process that uses multiprocess to do container fetching/processing | 15:18 |
mwhahaha | hence i think there's a bug in either python or the kernel | 15:18 |
mwhahaha | but w/o any other info on the node it's going to be impossible to track down at the moment | 15:19 |
mwhahaha | same deal with this one http://paste.openstack.org/show/795272/ | 15:19 |
fungi | previously i was seeing it happen while toci_gate_test.sh was running, so it could be anything common between what that does and what the tripleo.operator.tripleo_deploy : Standalone deploy task does | 15:19 |
* mwhahaha knows what it does | 15:20 | |
mwhahaha | what i need is the vm console output | 15:20 |
mwhahaha | to see if it's kernel panicing | 15:20 |
fungi | oh, and that one happened during tripleo.operator.tripleo_undercloud_install : undercloud install | 15:20 |
clarkb | mwhahaha: yes and as mentioend above I've suggested how we might get that | 15:20 |
mwhahaha | https://zuul.opendev.org/t/openstack/stream/bf54d4fdf5c040d590372a7cbfbd3c53?logfile=console.log will likely crash | 15:21 |
mwhahaha | it's on 2. attempt at the moment | 15:21 |
mwhahaha | 737774,2 tripleo-ci-centos-8-standalone (2. attempt) | 15:21 |
clarkb | mwhahaha: k, autohold is in place for that one, if the 3rd attempt fails we'll get it. Separately we can try grabbing the console log for it ahead of time while it is running the job | 15:23 |
clarkb | mwhahaha: any idea how far away from failing it would be now? | 15:23 |
mwhahaha | it fails ~30 mins in | 15:23 |
mwhahaha | i don't know how long it's running let me look at the console | 15:23 |
mwhahaha | maybe 10 mins | 15:23 |
mwhahaha | oh no probably like 5-10 from now | 15:24 |
mwhahaha | it just started teh deploy | 15:24 |
clarkb | a0c808c9-481e-44bc-8e64-9cfe8b90e1f2 is the instance uid in inap | 15:24 |
fungi | centos-8-inap-mtl01-0017417464 | 15:24 |
*** priteau has joined #openstack-infra | 15:26 | |
clarkb | I've managed to console log show it, nothing exciting yet | 15:26 |
fungi | i can also paste the web console url, i know that would normally be sensitive but this is a throwaway vm | 15:27 |
fungi | the vnc token should be unique, right? | 15:27 |
clarkb | fungi: I have no idea (and assuming that about openstack seems potentially dangerous) | 15:28 |
clarkb | fungi: but I guess if you open that locally you'll get the running console log and won't have to time it like my console log show | 15:28 |
clarkb | so maybe just open it locally and see if we catch anything? | 15:28 |
fungi | i do have it open locally, but am also polling console log show out to a local file just in case | 15:28 |
fungi | so far it's just iptables drop logs though | 15:29 |
clarkb | and selinx bookkeeping | 15:29 |
fungi | yep | 15:29 |
fungi | device br-ctlplane entered promiscuous mode | 15:30 |
*** priteau has quit IRC | 15:30 | |
fungi | in case anyone wondered | 15:30 |
*** priteau has joined #openstack-infra | 15:30 | |
fungi | ooh, "loop: module loaded" | 15:31 |
fungi | yeah, very much not exciting so far | 15:31 |
mwhahaha | like watching paint dry | 15:31 |
fungi | i really wish openstack console log show had something like --follow but that's probably tough to implement | 15:32 |
mwhahaha | it might succeed at this point, let me see if there's another one | 15:35 |
*** mordred has joined #openstack-infra | 15:35 | |
mwhahaha | crashed | 15:35 |
clarkb | the console log doesn't show any panic | 15:36 |
mwhahaha | hrm | 15:36 |
fungi | yeah, it's still logging iptables drops to | 15:36 |
fungi | too | 15:36 |
mwhahaha | i wonder if it's an ovs bug | 15:36 |
mwhahaha | so if it's still up but isn't reachable via zuul that's weird? | 15:37 |
*** jtomasek has quit IRC | 15:37 | |
clarkb | I can confirm it doesn't seem tp ping or be sshable from there | 15:38 |
fungi | yeah, network deems to be dead, dead, deadski | 15:38 |
clarkb | we've already determined this isn't cloud specific so unlikely that we're colliding a specific network range | 15:38 |
fungi | the node's ipv4 addy is/was 104.130.253.140 | 15:39 |
AJaeger | clarkb,fungi, can either of you me to ACLs of openstack-ux and solum-infra-guestagent so that I can retire these two repos,or do you want to abandon changes and approve the retirement change, please? | 15:39 |
fungi | and the instance just got deleted | 15:39 |
*** ricolin has joined #openstack-infra | 15:39 | |
fungi | want me to stick the recorded console log somewhere? | 15:39 |
clarkb | fungi: 198.72.124.67 is what I had according to the job log | 15:40 |
fungi | oh, yep nevermind, i grabbed that address out of the wrong console window | 15:40 |
clarkb | 104.130.253.140 looks like a rax IP but this was an inap node | 15:40 |
fungi | where i was troubleshooting something unrelated | 15:40 |
mwhahaha | 2020-06-26 15:37:59.536990 | primary | "msg": "Failed to connect to the host via ssh: ssh: connect to host 198.72.124.67 port 22: Connection timed out", | 15:41 |
fungi | anyway, i have the console log from the correct instance | 15:41 |
clarkb | fungi: probably doesn't hurt to share just in case there is some clue possibly in the iptables logging | 15:41 |
clarkb | if ping was working I'd suspect something crashed sshd | 15:41 |
clarkb | but pign doesn't seem to work either so more likely the network stack under sshd is having trouble | 15:41 |
clarkb | lack of kernel panic in the log implies it isn't a catastrophic failure | 15:42 |
fungi | grumble, it's slightly too long for paste.o.o | 15:42 |
fungi | i'll trim the first few lines from boot | 15:42 |
clarkb | also if the third pass of that job fails we should get a node hold and we can try a reboot and see if any of the logs on the host give us clues | 15:42 |
*** yamamoto has quit IRC | 15:44 | |
fungi | okay, i split the log in half between http://paste.openstack.org/show/795276 and http://paste.openstack.org/show/795277 | 15:44 |
fungi | mwhahaha: ^ | 15:45 |
mwhahaha | yea nothing out of the ordinary | 15:45 |
fungi | i wonder if we should stick something in systemd to klog the system time so we have something to generate log line offsets against | 15:47 |
openstackgerrit | Thierry Carrez proposed openstack/project-config master: [DNM] Define maintain-github-mirror job https://review.opendev.org/738228 | 15:47 |
clarkb | fungi: you might be able to find some other reference point like ssh user log in compared to zuul logs? | 15:47 |
fungi | oh, i know, but thinking for the future it might be nice not to have to | 15:48 |
fungi | in this case there wasn't anything worth calibrating anyway, but if we'd snagged a kernel panic we'd be able to tell how long after a particular job log line that happened | 15:49 |
fungi | which in some cases could help narrow things down to a narrower set of operations | 15:49 |
fungi | AJaeger: openstack-ux-core and i guess... solum-core? | 15:51 |
clarkb | mwhahaha: (this is me just thinking crazy ideas) Do you know if you have these failures on rackspace? Rackspace gives us two interfaces a public and a private interface. Most other clouds give us a single interface where we have public only, or private ipv4 that is NAT'd with a fip. Now for the crazy idea. A lot of jobs use the "private_ip" which is actually the public_ip in clouds without a private | 15:51 |
clarkb | ip to do things like set up multinode networking. On rax that would be on a completely separate interface so anything that may break that would be isolated from breaking zuul's connectivity via the public interface. However on basically all other clouds breaking that interface would also berak Zuul's connectivity | 15:51 |
mwhahaha | no idea | 15:52 |
mwhahaha | can you query zuul for the RETRY_LIMIT stuff? | 15:52 |
mwhahaha | it's not in logstash | 15:52 |
fungi | AJaeger: i've added you to those, let me know if that wasn't what you needed | 15:52 |
mwhahaha | w/o the logs i don't know where these are running | 15:53 |
clarkb | mwhahaha: ya I think we can ask zuul for that. Its not in logstash because there were no log files to index :/ | 15:53 |
AJaeger | fungi: thanks,letme check | 15:54 |
clarkb | hrm zuul build records don't have nodeset info | 15:54 |
clarkb | I guess weh ave to look at zuul logs | 15:55 |
fungi | clarkb: yeah, i'm seeing what i can parse out of the scheduler debug log first | 15:55 |
clarkb | fungi: thanks | 15:55 |
fungi | yeah, we'll have to glue scheduler and executor logs together | 15:59 |
fungi | the scheduler doesn't log node info, and the executor doesn't know when the result is retry_limit | 15:59 |
clarkb | fungi: we could just ask the executor for ssh connection problems | 15:59 |
*** xek has quit IRC | 15:59 | |
*** rpittau is now known as rpittau|afk | 15:59 | |
clarkb | and assume that is close enough | 15:59 |
fungi | yep, that's what i'm doing now | 16:00 |
fungi | RESULT_UNREACHABLE is what the executor has, which will actually be a lot more hits anyway | 16:00 |
AJaeger | fungi: I removed myself again from solum-core. Now waiting for slaweq to +A the networking-onos change and then those three repos can finish retiring | 16:02 |
*** ykarel is now known as ykarel|away | 16:04 | |
fungi | i've worked out a shell one-liner to pull the node names for each RESULT_UNREACHABLE failure, running this against all our executors now | 16:07 |
fungi | oh, right, this won't work on ze01 because containery, but i'll just snag the other 11 | 16:12 |
fungi | 1514 result_unreachable builds across ze02-12 in today's debug log | 16:12 |
*** ricolin has quit IRC | 16:13 | |
*** vishalmanchanda has quit IRC | 16:15 | |
*** psachin has quit IRC | 16:18 | |
*** yamamoto has joined #openstack-infra | 16:20 | |
fungi | the distribution looks like it may favor inap a lot more than the proportional quotas would account for: http://paste.openstack.org/show/795279 | 16:20 |
fungi | a little under a third of the unreachable failures occurred there, when they account for a lot less than a third of our aggregate quota | 16:21 |
fungi | the node label distribution indicates we see more on ubuntu then centos too: http://paste.openstack.org/show/795280 | 16:22 |
fungi | er, than | 16:22 |
*** yamamoto has quit IRC | 16:27 | |
*** Lucas_Gray has quit IRC | 16:28 | |
clarkb | fungi: we probably want to filter for centos to isolate the tripleo case sinse it seems consistent | 16:31 |
fungi | yeah, i can also try to filter for tripleo jobs, i suppose | 16:32 |
*** mordred has quit IRC | 16:33 | |
*** gyee has joined #openstack-infra | 16:35 | |
*** ociuhandu_ has joined #openstack-infra | 16:36 | |
*** mordred has joined #openstack-infra | 16:38 | |
*** hamalq has joined #openstack-infra | 16:38 | |
*** ociuhandu has quit IRC | 16:38 | |
*** ociuhandu_ has quit IRC | 16:40 | |
*** hamalq_ has joined #openstack-infra | 16:40 | |
*** amoralej is now known as amoralej|off | 16:40 | |
fungi | grep $(grep $(grep '\[e: .* result RESULT_UNREACHABLE ' /var/log/zuul/executor-debug.log | sed 's/.*\[e: \([0-9a-f]\+\)\].*/-e \1.*Beginning.job.*tripleo/') /var/log/zuul/executor-debug.log | sed 's/.*\[e: \([0-9a-f]\+\)\].*/-e \1.*Provider:/') /var/log/zuul/executor-debug.log | sed 's/.*\\\\nProvider: \(.*\)\\\\nLabel: \(.*\)\\\\nInterface .*/\1 \2/' > nodes | 16:40 |
fungi | in case you wondered | 16:40 |
*** smarcet has quit IRC | 16:40 | |
*** jpena is now known as jpena|off | 16:42 | |
fungi | 861 tripleo jobs with unreachable results in the logs today so far | 16:43 |
fungi | breakdown by provider-region: http://paste.openstack.org/show/795283 | 16:44 |
*** hamalq has quit IRC | 16:44 | |
fungi | and by node label: http://paste.openstack.org/show/795284 | 16:44 |
fungi | this was a crude match for any result_unreachable builds with "tripleo" in the job name | 16:45 |
weshay_pto | fungi, afaict.. there was a large event on 6/14 where this peaked and has been an issue since.. not as many hits as 6/14 though.. | 16:45 |
weshay_pto | you seeing anything similar? | 16:46 |
weshay_pto | w/ when this started | 16:46 |
fungi | i only analyzed today's debug log | 16:46 |
AJaeger | regarding these numbers: How does that compare to all runs? I mean: Do we run 3 times as much CentOS8 jobs than bionic for tripleo - and therefore the 3 timesfailure is not significant? | 16:46 |
fungi | AJaeger: yeah, that's likely the case. i don't think these ratios are telling us much on the node label side. on the provider-region side it suggests that inap is getting a disproportionately larger number of these, i think | 16:47 |
*** priteau has quit IRC | 16:48 | |
fungi | interesting though that i'm getting some airship-kna1 node hits in here for tripleo jobs. that may mean i'm not filtering the way i thought. investigating | 16:49 |
mwhahaha | weshay_pto: we updated openvswitch on 6/16, perhaps that's the issue? | 16:50 |
mwhahaha | openvswitch-2.12.0-1.1.el8.x86_64.rpm2020-06-16 07:572.0M | 16:50 |
fungi | ahh, nevermind, airship-kna1 also hosts a small percentage of normal node labels | 16:50 |
fungi | so that's expected | 16:50 |
weshay_pto | mwhahaha, could be part of the issue for sure.. given it's openvswitch.. but it would not explain the spike on 6/14 | 16:51 |
mwhahaha | i don't know if that's related | 16:51 |
rlandy|ruck | could we revert that upgrade? | 16:52 |
mwhahaha | given that the networking goes poof, how we configure the interfaces on the nodes, and the lack of like a kernel panic, it seems to be openvswitch | 16:52 |
mwhahaha | we dont' see retry_failure on centos7 jobs right? | 16:53 |
mwhahaha | that didn't get updated | 16:53 |
weshay_pto | 6/14 spike looks like the mirror outtage 2020-06-14 06:42:29.732260 | primary | Cannot download 'https://mirror.kna1.airship-citycloud.opendev.org/centos/8/AppStream/x86_64/os/': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried. | 16:53 |
*** markvoelker has joined #openstack-infra | 16:53 | |
weshay_pto | so we can ignore that spike | 16:53 |
mwhahaha | yea that was the day of the mirror outages i think | 16:53 |
mwhahaha | i really think it's openvswitch | 16:53 |
fungi | i'm putting together some trends for RESULT_UNREACHABLE on "tripleo" named jobs per day over the past month | 16:54 |
mwhahaha | centos 8.2 came out on 6-11 | 16:55 |
weshay_pto | ya.. mwhahaha looking at each day.. we start to see network go down after openvswitch update | 16:56 |
mwhahaha | do we know when the first infra image switched to it? | 16:56 |
weshay_pto | 6/18 is the first day.. I see network going down | 16:56 |
weshay_pto | 1 hit on 6/17.. so not sure how long mirrors take to update.. | 16:57 |
fungi | normally they update every 2 hours | 16:57 |
*** markvoelker has quit IRC | 16:58 | |
fungi | but we pull from a mirror to make our mirror, so can be delayed by however long the mirror we're pulling from takes to reflect updates too | 16:58 |
*** derekh has quit IRC | 17:00 | |
mwhahaha | hrm the openvswitch release was a build w/o dpdk | 17:00 |
mwhahaha | so maybe not | 17:00 |
fungi | parsing 30 days of compressed executor logs is taking a bit of time, but i should have something soonish | 17:06 |
mwhahaha | my only other thought is that there is a known issue with iptables on the 8.2 kernel and we end up configuring the network/iptables about the time it fails | 17:09 |
*** smarcet has joined #openstack-infra | 17:09 | |
mwhahaha | though i would have thought there to be a stack trace on the console if that was the case | 17:10 |
*** ociuhandu has joined #openstack-infra | 17:11 | |
*** smarcet has quit IRC | 17:17 | |
*** kaiokmo has quit IRC | 17:17 | |
*** ociuhandu has quit IRC | 17:18 | |
openstackgerrit | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/738150 | 17:23 |
*** jamesmcarthur has quit IRC | 17:23 | |
*** jamesmcarthur has joined #openstack-infra | 17:23 | |
*** jamesmcarthur has quit IRC | 17:27 | |
fungi | ~14k unreachable results in the past month across 11 of our 12 executors | 17:38 |
*** udesale has quit IRC | 17:39 | |
fungi | mwhahaha: weshay_pto: here's what the hourly breakdown looks like for the past month: http://paste.openstack.org/show/795290 | 17:42 |
fungi | and here's the daily breakdown: http://paste.openstack.org/show/795291 | 17:42 |
fungi | note these are not scaled by the number of jobs run, these are simply counts of result_unreachable builds for jobs with "tripleo" in their names | 17:43 |
*** gfidente has quit IRC | 17:44 | |
fungi | er, fixed daily aggregates paste, the previous one had a bit of cruft at the beginning: http://paste.openstack.org/show/795292 | 17:45 |
*** jamesmcarthur has joined #openstack-infra | 17:50 | |
*** mtreinish has quit IRC | 18:04 | |
*** mtreinish has joined #openstack-infra | 18:05 | |
*** jamesmcarthur has quit IRC | 18:08 | |
clarkb | fungi: re airship kna we did that to help ensure things were running normally there | 18:13 |
clarkb | fungi: something we learned doing the tripleo clouds having the resources dedicated to a specfiic purpose makes it harder to understand what is going on there when things break | 18:14 |
rlandy|ruck | from the pastes above, it looks like we have better days and worse days | 18:15 |
mwhahaha | probably related to the number of patches, though hose numbers seem really high | 18:16 |
clarkb | fungi: also re ze01 the container should log to the same location as the non container runs | 18:16 |
rlandy|ruck | 819 2020-06-23 | 18:16 |
rlandy|ruck | 800 2020-06-24 | 18:16 |
rlandy|ruck | 846 2020-06-25 | 18:16 |
fungi | i concur, those could also indicate days where you simply had higher change activity | 18:16 |
rlandy|ruck | ^^ consistent bad though | 18:16 |
fungi | as i said, that's not scaled by the overall build count for those jobs | 18:16 |
rlandy|ruck | mwhahaha: do we have a next step here? something we can try on our end? | 18:20 |
mwhahaha | it's kinda hard because the layers of logging here | 18:21 |
mwhahaha | we really need to either reproduce it or get a node that it failed on | 18:21 |
* fungi checks to see if clarkb's hold caght anything | 18:22 | |
clarkb | fungi: I was just checkingand I don't think it did but double check as I'm trying to eat a sandwich too :) | 18:22 |
fungi | nope, it's still set | 18:23 |
mwhahaha | sudo make me a sandwich | 18:23 |
*** ralonsoh has quit IRC | 18:25 | |
clarkb | I think we can set up some holds on jobs likely to hit the issue then see if we catch any. Another approach could be to try and reproduce it outside of nodepool and zuul's control with a VM (or three) in inap | 18:32 |
*** markvoelker has joined #openstack-infra | 18:49 | |
*** markvoelker has quit IRC | 18:54 | |
*** smarcet has joined #openstack-infra | 19:04 | |
*** lbragstad_ has joined #openstack-infra | 19:04 | |
*** lbragstad has quit IRC | 19:06 | |
*** eolivare has quit IRC | 19:10 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Finish retirement of openstack-ux,solum-infra-guestagent https://review.opendev.org/737992 | 19:16 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Finish retirement of networking-onos https://review.opendev.org/738263 | 19:16 |
AJaeger | config-core, 737992 is ready to merge - onos is waiting for final approval. I thus split these, please review 992. | 19:16 |
clarkb | +2 | 19:17 |
AJaeger | thanks, clarkb | 19:17 |
EmilienM | rlandy|ruck: https://review.opendev.org/#/c/738025/ failed again :/ | 19:22 |
EmilienM | infra-code: I would really request a force merge if it's possible for you | 19:22 |
EmilienM | infra-core^ sorry | 19:22 |
rlandy|ruck | EmilienM: don't worry about - it may fail on the retry_limit | 19:24 |
clarkb | EmilienM: does it fix a gate bug? it isn't clear from the commit message why that would be a priority | 19:24 |
clarkb | rlandy|ruck: it did fail on retry limit but also something else | 19:25 |
rlandy|ruck | clarkb: no - it failed because the we updated the cirros image for tempest but not the space requirements. our fault | 19:25 |
rlandy|ruck | it's a revert - it does fail gates though | 19:26 |
rlandy|ruck | but it could fail again on retry_limit so I'll just try the regular route of getting patches in | 19:26 |
clarkb | right but usually when we force merge something its something that fill fix gate failures. There is no indication in the commit message that it does this (note I don't expect it to fix the retry limits but an indication that it fixes a testing bug hence the bypass of testing would be nice) | 19:26 |
rlandy|ruck | clarkb: yeah - updating the commit message with the bug details | 19:27 |
rlandy|ruck | ok - patch updated - but let's let it run through the regular channels | 19:32 |
clarkb | ok, that helps. Let us know if force merge is appropriate after it tries the normal route | 19:33 |
rlandy|ruck | clarkb: thanks | 19:33 |
*** smarcet has quit IRC | 19:47 | |
*** smarcet has joined #openstack-infra | 19:56 | |
*** smarcet has quit IRC | 20:01 | |
*** slaweq has quit IRC | 20:02 | |
*** smarcet has joined #openstack-infra | 20:05 | |
*** slaweq has joined #openstack-infra | 20:06 | |
*** slaweq has quit IRC | 20:10 | |
*** yamamoto has joined #openstack-infra | 20:25 | |
*** yamamoto has quit IRC | 20:30 | |
*** markvoelker has joined #openstack-infra | 20:35 | |
*** markvoelker has quit IRC | 20:39 | |
*** smarcet has quit IRC | 20:40 | |
*** hashar has quit IRC | 21:02 | |
*** armax has quit IRC | 21:10 | |
*** armax has joined #openstack-infra | 21:26 | |
*** paladox has quit IRC | 21:33 | |
*** paladox has joined #openstack-infra | 21:37 | |
*** lbragstad_ has quit IRC | 21:38 | |
*** markvoelker has joined #openstack-infra | 22:35 | |
*** markvoelker has quit IRC | 22:40 | |
*** tosky has quit IRC | 23:00 | |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Run openafs promote job only if gate job run https://review.opendev.org/738155 | 23:51 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!