*** tosky has quit IRC | 00:01 | |
openstackgerrit | Merged zuul/zuul-jobs master: Clarity tox_environment accepts dictionary not list https://review.opendev.org/c/zuul/zuul-jobs/+/769433 | 00:05 |
---|---|---|
openstackgerrit | Merged zuul/zuul-jobs master: Document Python siblings handling for tox role https://review.opendev.org/c/zuul/zuul-jobs/+/768823 | 00:05 |
*** zbr5 has joined #opendev | 00:22 | |
*** zbr has quit IRC | 00:22 | |
*** zbr5 is now known as zbr | 00:22 | |
*** mlavalle has quit IRC | 00:47 | |
openstackgerrit | Merged zuul/zuul-jobs master: Add configuration to make logs public https://review.opendev.org/c/zuul/zuul-jobs/+/764483 | 01:02 |
*** kevinz has joined #opendev | 01:03 | |
*** icey has quit IRC | 01:15 | |
*** icey has joined #opendev | 01:16 | |
openstackgerrit | Merged zuul/zuul-jobs master: Allow to retrieve releasenotes requirements from a dedicated place https://review.opendev.org/c/zuul/zuul-jobs/+/769292 | 01:36 |
kevinz | ianw: Morning! | 01:37 |
ianw | kevinz: hi, happy new year | 01:37 |
kevinz | ianw: Thanks! Happy new year! I saw there is an issue accessing to Linaro Us right? But checking the pinging, I see both the IPv4 and IPv6 works | 01:38 |
ianw | i only got back from pto today, and wasn't aware of anything. fungi: ^ are there current issues? | 01:39 |
kevinz | ianw: OK, np | 01:39 |
kevinz | https://mirror.regionone.linaro-us.opendev.org/debian/dists/buster-backports/InRelease, this can not work, but ping is fine | 01:39 |
ianw | hrm, will check in a little | 01:40 |
*** tkajinam has quit IRC | 01:41 | |
*** tkajinam has joined #opendev | 01:42 | |
ianw | ok, host is up, noting interesting in dmesg | 01:57 |
ianw | kevinz: do you perhaps mean https://mirror.iad.rax.opendev.org/debian/dists/buster-backports/Release ? | 02:02 |
kevinz | ianw: aha, after checking the IRC log, this issue for linaro-us has been fixed after reboot instance | 02:07 |
kevinz | http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-01-12.log.html. | 02:07 |
kevinz | Thanks | 02:07 |
kevinz | ianw: thanks for helping | 02:07 |
*** tkajinam has quit IRC | 02:09 | |
*** tkajinam has joined #opendev | 02:10 | |
ianw | kevinz: did you manage to find anything on why these nodes shutdown suddenly? | 02:10 |
ianw | I don't think we have an InRelease file, just Release because we don't sign our repos | 02:11 |
ianw | but I guess per that link, apt *looks* for InRelease, and if the mirror is down will give that error | 02:11 |
kevinz | req-3a0fe54f-e97f-4ca0-b24f-6bcdabc9be27StartJan. 12, 2021, 10:21 a.m.0881516836d94a8f890a031f84c985ef- | 02:13 |
kevinz | req-1e4b24f1-13f3-4c1f-bf19-e4f1e0c8b053StopJan. 11, 2021, 2:52 p.m.-- | 02:13 |
kevinz | req-2e2cc170-c6b4-491a-804c-5af5efd604d0StartDec. 19, 2020, 12:44 a.m.0881516836d94a8f890a031f84c985ef- | 02:13 |
kevinz | req-5cf099bb-011a-4e64-902d-40ab2e8795a5StopDec. 18, 2020, 9:25 p.m.-- | 02:13 |
kevinz | req-556cbdab-8639-42ab-b624-30b6b4ade719StartNov. 8, 2020, 10:06 p.m.0881516836d94a8f890a031f84c985ef- | 02:13 |
kevinz | req-b04bbf39-2897-4e62-a30d-99d4722c3c70StopNov. 5, 2020, 7:18 a.m.- | 02:13 |
ianw | i had a remote netconsole running and didn't get any sort of message out of the host | 02:14 |
kevinz | ianw: http://paste.openstack.org/show/801575/, looks it has been shutdown every month | 02:14 |
ianw | it was like it was just killed | 02:14 |
kevinz | ianw: run out of the resources and got killed? | 02:15 |
kevinz | by host | 02:15 |
ianw | maybe? I'd expect some sort of logs in nova ... | 02:15 |
kevinz | I will check the nova-log for this req number | 02:16 |
ianw | i setup a console on 2020-11-09 | 02:16 |
ianw | doh, i dropped the "-a" from the tee command so i stupidly have overwritten when it stopped | 02:20 |
ianw | Sep 14 09:09:58 <ianw> Alex_Gaynor: thanks for pointing that out. it seems we have a problem with the mirror node in that region. | 02:22 |
ianw | req-4c549e46-760b-4353-b92d-2503e13a96c5StartSept. 13, 2020, 10:39 p.m.0881516836d94a8f890a031f84c985ef- | 02:23 |
ianw | probably matches; are those times UTC? | 02:23 |
kevinz | ianw: yes, it is UTC timezone | 02:35 |
kevinz | ianw: checking the log from nova-compute, just get this: http://paste.openstack.org/show/801576/. Will find more in nova-api and conductor | 02:45 |
ianw | that definitely seems like nova noticed the vm had already shutdown, then updated the db | 02:48 |
ianw | kevinz: i'd be looking for corresponding oom/kill type messages in syslog for qemu-kvm around the same time ... | 02:50 |
kevinz | ianw: will check | 02:50 |
*** hamalq has quit IRC | 02:56 | |
ianw | kevinz: do the compute nodes have a little swap space? | 02:57 |
*** ysandeep|away is now known as ysandeep | 03:24 | |
kevinz | ianw: http://paste.openstack.org/show/801579/, yes, totally 4578M | 03:28 |
kevinz | And I see some Qemu failed at Jan 11. | 03:28 |
ianw | hrm, so 96gb ram, 4gb swap (approx) right? | 03:31 |
ianw | although there's a lot of free ram now, the swap does seem used, which suggests to me it might have been under memory pressure at some other time | 03:32 |
*** sboyron has quit IRC | 03:41 | |
kevinz | ianw: yes, look memories pressure. I will disable this node scheduling for a while, to see if there would be better | 03:58 |
kevinz | ianw: I see there are quite a lot of instance are scheduling and running at this node, they are always scheduling to this node. Looks the nova scheduler are not good decided... | 04:27 |
kevinz | I disable this node scheduling and I will check what is wrong with the nova-scheduler | 04:28 |
ianw | kevinz: thanks; something like that would explain the very random times it seems to stop i guess. we can go month(s) with nothing but then a few failures in a week it feels like | 04:30 |
kevinz | ianw: yes, definitely. Let's see what will happen recently. | 04:32 |
fungi | ianw: kevinz: the mirror and builder instances were both found in a shutdown state again, we managed to boot them though ended up needing to delete the afs cache on the mirror as it ended up seemingly corrupted to the point where afsd would just hang indefinitely | 04:45 |
ianw | fungi: yeah, that seems to be a common issue when it is shutdown unsafely | 04:46 |
fungi | looking at grafana we're still behind on node requests (though on track to catch up by the time daily periodics kick off), and tripleo still has a 10-hour gate backlog | 04:52 |
fungi | so maybe we should postpone the scheduler restart | 04:52 |
ianw | i'm heading out in ~ 30 mins, so won't be able to watch this evening | 04:58 |
ianw | if tomorrow we get reviews on the zuul summary plugin, it might be worth restarting scheduler and gerrit at the same time | 04:58 |
fungi | great point | 05:23 |
*** ykarel has joined #opendev | 05:41 | |
*** marios has joined #opendev | 06:28 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Move snaps ACL to x https://review.opendev.org/c/openstack/project-config/+/770538 | 07:00 |
openstackgerrit | Merged openstack/project-config master: Create microstack-specs project https://review.opendev.org/c/openstack/project-config/+/770460 | 07:11 |
openstackgerrit | Merged zuul/zuul-jobs master: Enable installing nimble siblings https://review.opendev.org/c/zuul/zuul-jobs/+/765672 | 07:13 |
*** ralonsoh has joined #opendev | 07:19 | |
*** eolivare has joined #opendev | 07:35 | |
*** openstackgerrit has quit IRC | 07:47 | |
*** jpena|off is now known as jpena | 07:51 | |
*** JayF has quit IRC | 07:52 | |
*** openstackgerrit has joined #opendev | 07:53 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 07:53 |
*** fressi has joined #opendev | 07:58 | |
*** slaweq has joined #opendev | 07:59 | |
*** diablo_rojo__ has quit IRC | 08:01 | |
*** hashar has joined #opendev | 08:03 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 08:03 |
*** slaweq has quit IRC | 08:04 | |
openstackgerrit | Sorin Sbârnea proposed openstack/project-config master: Move git-review zuul config in-tree https://review.opendev.org/c/openstack/project-config/+/763808 | 08:05 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 08:06 |
*** slaweq has joined #opendev | 08:10 | |
*** andrewbonney has joined #opendev | 08:13 | |
openstackgerrit | Sorin Sbârnea proposed openstack/project-config master: Move git-review zuul config in-tree https://review.opendev.org/c/openstack/project-config/+/763808 | 08:21 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 08:23 |
*** rpittau|afk is now known as rpittau | 08:25 | |
*** sboyron has joined #opendev | 08:27 | |
*** tosky has joined #opendev | 08:39 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 09:18 |
*** hrw has left #opendev | 09:22 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 09:23 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Drop support for py27 https://review.opendev.org/c/opendev/git-review/+/770556 | 09:44 |
jrosser | i am seeing a number of Could not connect to mirror.regionone.limestone.opendev.org:443 (216.245.200.130). - connect (113: No route to host) | 09:49 |
lourot | ^ same for us, e.g. in https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/770297 | 10:05 |
frickler | yet another mirror gone offline, this is getting creepy. /me tries to take a look | 10:09 |
frickler | infra-root: ^^ console log shows a lot of CPU/rcu related issues. trying a restart via the api | 10:12 |
openstackgerrit | Merged openstack/project-config master: Move git-review zuul config in-tree https://review.opendev.org/c/openstack/project-config/+/763808 | 10:24 |
*** hemanth_n has joined #opendev | 10:28 | |
*** hashar has quit IRC | 10:39 | |
*** dtantsur|afk is now known as dtantsur | 10:41 | |
*** hemanth_n has quit IRC | 11:01 | |
*** ysandeep is now known as ysandeep|afk | 11:04 | |
*** sshnaidm|afk is now known as sshnaidm|ruck | 11:19 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 11:45 |
sshnaidm|ruck | infra-root is problem with retries in limestone known? | 12:08 |
*** DSpider has joined #opendev | 12:10 | |
zbr | sshnaidm|ruck: one mirror went down two hours ago. | 12:22 |
frickler | sshnaidm|ruck: zbr: trying to restart the server via the api hasn't worked. doing a stop/start cycle next | 12:49 |
*** jpena is now known as jpena|lunch | 12:49 | |
frickler | GPF while trying to start the AFS client ... guess we'll need to rebuild the mirror or talk to limestone about a possibly broken hypervisor. disabling that region for now | 12:53 |
frickler | hmm ... actually the node did finish booting but failed with afs. did "rm -rf /var/cache/openafs/*" and another reboot, maybe that'll be enough for now | 12:55 |
frickler | o.k., that seems to have worked for now, maybe the GPF was in fact related to afs cache corruption | 13:03 |
frickler | #status log stopped and restarted mirror.regionone.limestone.opendev.org after it had become unresponsive. need afs cache cleanup, too. | 13:04 |
openstackstatus | frickler: finished logging | 13:04 |
frickler | jrosser: lourot: sshnaidm|ruck: zbr: ^^ please let us know if you encounter any further issues, should be safe to recheck now. | 13:05 |
sshnaidm|ruck | frickler, thanks a lot! | 13:05 |
*** brinzhang has quit IRC | 13:11 | |
*** brinzhang has joined #opendev | 13:11 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 13:16 |
*** ysandeep|afk is now known as ysandeep | 13:25 | |
*** whoami-rajat__ has joined #opendev | 13:46 | |
*** jpena|lunch is now known as jpena | 13:51 | |
*** d34dh0r53 has quit IRC | 14:10 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Drop support for py27 https://review.opendev.org/c/opendev/git-review/+/770556 | 14:11 |
*** d34dh0r53 has joined #opendev | 14:13 | |
*** hashar has joined #opendev | 14:47 | |
*** auristor has quit IRC | 14:47 | |
*** fressi has quit IRC | 14:48 | |
*** auristor has joined #opendev | 14:50 | |
fungi | frickler: did afsd also not start completely on that mirror after restarting? | 15:08 |
fungi | ahh, you said yes | 15:09 |
mnaser | diablo_rojo_phon: when you have a second, if you could rebase https://review.opendev.org/c/openstack/project-config/+/767057 | 15:10 |
kopecmartin | hi all, I'd like to update refstack server (https://refstack.openstack.org) about the latest changes in the refstack repo (https://opendev.org/osf/refstack/), could anyone point me a direction on how? Thank you | 15:21 |
clarkb | kopecmartin: currently it is deployed using opendev/system-config and puppet-refstack iirc. But I proposed a change a while back to instead build docker images for it and deploy that instead in opendev/sytem-config. the problems there were that a number of changes needed to be made to refstack itself to be viable and I ran out of steam on it | 15:34 |
clarkb | kopecmartin: I think we should pick that back up again if we are going to try and make updates | 15:35 |
fungi | also the current refstack server is running ubuntu 14.04 lts | 15:35 |
fungi | and current master branch says it needs python 3.6 or newer, while that version of ubuntu only has python 3.4 | 15:35 |
kopecmartin | oh, I see .. I'm happy to help .. also we have reformed refstack group a little, so there is enough core reviewers now if anything needs to be changed on refstack side | 15:39 |
kopecmartin | in regards of the server OS update, i'm happy to help if you point me a direction | 15:40 |
clarkb | kopecmartin: please feel free to take over that change in system-config it should show up if you earch for me and refstack in system-config | 15:40 |
clarkb | kopecmartin: that would happen as part of the redeployment with docker. So get the docker stuff working in CI then an infra-root will work with you to do the host migration and all that | 15:40 |
kopecmartin | clarkb: thanks, I'm gonna have a look | 15:42 |
*** hashar is now known as hasharAway | 15:43 | |
openstackgerrit | Merged opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539 | 15:48 |
ttx | kopecmartin: thanks for picking that up! I was looking into updating instructions when I realized they were already updated but just missing a current deployment | 15:53 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Assure git-review works with py37 and py38 https://review.opendev.org/c/opendev/git-review/+/770641 | 15:53 |
openstackgerrit | Jeremy Stanley proposed opendev/engagement master: Initial commit https://review.opendev.org/c/opendev/engagement/+/729293 | 16:00 |
*** ykarel is now known as ykarel|away | 16:09 | |
*** ysandeep is now known as ysandeep|out | 16:12 | |
*** ykarel|away has quit IRC | 16:16 | |
*** slaweq has quit IRC | 16:34 | |
*** slaweq has joined #opendev | 16:36 | |
*** chrome0 has quit IRC | 16:40 | |
*** tosky has quit IRC | 16:41 | |
*** tosky has joined #opendev | 16:42 | |
*** chrome0 has joined #opendev | 16:45 | |
*** eolivare has quit IRC | 16:52 | |
*** jpena is now known as jpena|off | 17:05 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Adjust the example Etherpad API delete command https://review.opendev.org/c/opendev/system-config/+/770648 | 17:06 |
clarkb | #status log Manually deleted an etherpad at the request of dmsimard. | 17:12 |
openstackstatus | clarkb: finished logging | 17:12 |
fungi | interesting, looks like some of our afs vos release runs began to consistently fail at 04:23 today | 17:14 |
fungi | i'll check the fileservers | 17:14 |
fungi | both are up and there are no recent restarts | 17:14 |
clarkb | fungi: how is disk utilization? | 17:14 |
clarkb | (I think we've been good on that side of things but could potentially explain it?) | 17:15 |
fungi | the /vicepa fs has 392G available on afs01.dfw and 1.1T available on afs02.dfw | 17:15 |
fungi | dmesg indicates io errors talking to a cinder volume on afs02.dfw starting at 03:32:42 | 17:17 |
clarkb | fungi: are there stale locks? thats about the only other thing I can think of that would cause something like that | 17:17 |
clarkb | oh interseting I guess that could do it too | 17:17 |
fungi | [Wed Jan 13 03:32:42 2021] INFO: task jbd2/dm-0-8:484 blocked for more than 120 seconds. | 17:17 |
fungi | [Wed Jan 13 03:35:58 2021] blk_update_request: I/O error, dev xvdk, sector 12328 | 17:17 |
fungi | i'll reboot the server | 17:18 |
clarkb | ok | 17:18 |
fungi | make sure it reconnects to the volumes correctly | 17:18 |
*** marios is now known as marios|out | 17:18 | |
fungi | okay, server's back up. i'll try to stop/fix any lingering hung vos releases or locks | 17:23 |
clarkb | thanks! | 17:24 |
fungi | may not be any cleanup required, looks like the errors stopped as soon as afs02.dfw was restarted | 17:29 |
clarkb | I wonder if increasing that kernel timeout would help (and if it is even tunable) | 17:29 |
*** sshnaidm|ruck is now known as sshnaidm|afk | 17:30 | |
fungi | some volumes needed "full" releases, so it's taking a bit of time to catch up again | 17:31 |
fungi | #status log rebooted afs02.dfw following hung kernel tasks and apparent disconnect from a cinder volume starting at 03:32:42, volume re-releases are underway but some may be stale for the next hour or more | 17:34 |
openstackstatus | fungi: finished logging | 17:35 |
*** mlavalle has joined #opendev | 17:57 | |
*** hamalq has joined #opendev | 18:03 | |
*** rpittau is now known as rpittau|afk | 18:05 | |
*** marios|out has quit IRC | 18:06 | |
openstackgerrit | Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656 | 18:06 |
clarkb | fungi: doesn't look like zuul is any happier with its backlog today? | 18:09 |
*** cloudnull has quit IRC | 18:12 | |
*** cloudnull has joined #opendev | 18:12 | |
*** andrewbonney has quit IRC | 18:12 | |
fungi | not really no | 18:13 |
clarkb | looking at grafana nothing stands out as being very broken. I guess just backlogs due to demand and potentially made worse by the mirror issue in limestone earlier today | 18:13 |
clarkb | boot times seem consistent and failures are infrequent | 18:13 |
clarkb | I do wonder if we have no quota in vexxhost though as that is a semi common thing due to volume leaks | 18:14 |
clarkb | grafana indicates that no we are using our quota there | 18:14 |
clarkb | the number of jobs a single neutron change runs is not small | 18:16 |
fungi | gerrit event volume has started to dip, so looks like the node request backlog is plateauing around 2-2.5k at least | 18:18 |
fungi | we seem to max out at roughly 600 nodes in use | 18:19 |
*** ralonsoh has quit IRC | 18:20 | |
fungi | aha, mystery solved on the afs02.dfw issue. ticket rackspace opened: This message is to inform you that our monitoring systems have detected a problem with the server which hosts your Cloud Block Storage device, afs02.dfw.opendev.org/main04, '9f19fd0d-a33e-4670-817c-93dd1e6c6e6f' at 2021-01-13T03:59:33.166398. | 18:28 |
fungi | if i hadn't been so distracted this morning i might have read the root inbox earlier and noticed/fixed the problem sooner | 18:30 |
*** dtantsur is now known as dtantsur|afk | 18:43 | |
openstackgerrit | Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656 | 18:53 |
openstackgerrit | Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656 | 19:04 |
openstackgerrit | Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656 | 19:24 |
*** hamalq has quit IRC | 19:29 | |
*** paladox has quit IRC | 19:34 | |
*** hasharAway has quit IRC | 19:36 | |
*** paladox has joined #opendev | 19:39 | |
*** whoami-rajat__ has quit IRC | 19:55 | |
openstackgerrit | Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656 | 20:08 |
*** sboyron has quit IRC | 20:09 | |
openstackgerrit | Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656 | 20:24 |
*** slaweq has quit IRC | 20:41 | |
fungi | tarballs volume has been releasing for 4 hours now. i probably should have had the foresight to create locks preventing the mirror volumes from getting released until that was done :/ | 21:27 |
fungi | at this point cronjobs have started what are probably full releases which are still in progress for deb-octopus, yum-puppetlabs, debian, opensuse, debian-security, ubuntu-ports, epel, centos, and fedora | 21:29 |
mordred | fungi: "fun" | 21:30 |
fungi | yeah, i expect the tarballs release would have only required two hours based on the transfer volume cacti is clocking on the network interface, but roughly an hour in the package mirrors started also getting releases triggered | 21:31 |
fungi | which likely slowed the second half of the tarballs volume transfer to a crawl | 21:32 |
fungi | i'm hesitant to abruptly terminate any of the mirror volume releases though over worries that will lead to even more cleanup work | 21:33 |
clarkb | we don't typically do global locks but I guess the issue now is its a full release? | 21:33 |
fungi | yeah, and a full release of a bunch of different volumes at once i expect | 21:33 |
fungi | i guess because these tried to release at some point while the filesystem was hitting write errors | 21:34 |
ianw | fungi: the tarballs one, i don't think that runs with -localauth? the mirror ones should be running via a ssh session and not hit a timeout | 21:41 |
fungi | 2021-01-13 17:28:56,918 release DEBUG Running: ssh -T -i /root/.ssh/id_vos_release vos_release@afs01.dfw.openstack.org -- vos release project.tarballs | 21:47 |
fungi | i'm not worried about auth timeouts, just that it's going to be an age before tarballs.o.o and some other sites (zuul-ci.org, et cetera) are current again | 21:51 |
*** brinzhang_ has joined #opendev | 23:02 | |
*** brinzhang has quit IRC | 23:05 | |
clarkb | ianw: for https://review.opendev.org/c/opendev/system-config/+/767059/ does ansible work with symlinks like that? any reason to no just keep treating the canonical srever as .openstack.org until we properly rename it? | 23:09 |
clarkb | (mostly concerned that we'll run against prod without the necessary vars loaded in) | 23:09 |
ianw | clarkb: one sec, context switching back to it :) | 23:10 |
clarkb | its just seems like we're getting ahead of schedule with that one | 23:10 |
fungi | yeah, elsewhere we still use openstack.org in the inventory name for it | 23:10 |
clarkb | yup the server is still named openstack.org canonically in nova too | 23:11 |
clarkb | we just server review.opendev.org on it too | 23:11 |
ianw | so yeah, i think that started with me looking at the testinfra, which was trying to match against review01.opendev.org and thus not actually running the tests; which i think is only "is this listening" | 23:13 |
fungi | i have a feeling this is also not going to be a good time for a gerrit restart looking at the graphs... wonder if we should shoot for late utc friday, next week is openstack wallaby milestone 2 which likely explains the rush on the gate | 23:13 |
ianw | so iirc my issue was as i expanded the testing, i didn't want to have it in a weird state of testing against review01.openstack.org | 23:15 |
clarkb | ianw: ya I think we should fix the test to look at review.openstack.org. Then when we switch the host over we can update that too? | 23:15 |
clarkb | I don't think that is a weird state if that is reality | 23:16 |
clarkb | but maybe I'm missing something else too | 23:16 |
clarkb | fungi: ya agreed we should probably wait for CI to settle before restarting services like gerrit and zuul | 23:16 |
ianw | just that it's already in a dual state, in that the vhost name is set to review.opendev.org | 23:17 |
fungi | ianw: well, we have two vhosts there (we could redo the vhost config to use an alias instead) | 23:18 |
ianw | i guess what i mean is | 23:19 |
ianw | inventory/service/host_vars/review01.openstack.org.yaml:gerrit_vhost_name: review.opendev.org | 23:19 |
fungi | right, we do that | 23:20 |
clarkb | yes because the server is canonically named review01.openstack.org (that will change when it gets upgraded) | 23:20 |
clarkb | (it is confusing, but I worry that changingCI will make it more confusing because CI will be different than prod) | 23:20 |
clarkb | coule of question on https://review.opendev.org/c/opendev/system-config/+/767078 too but I think we can probably land that one as is then make those changes if we want to | 23:21 |
fungi | it wouldn't technically be all that different if we had inventory/service/host_vars/review01.opendev.org.yaml:gerrit_vhost_name: review.opendev.org because the ansible inventory hostname and apache vhost canonical name are still not the same | 23:21 |
clarkb | the more I think about it hte more I'm thinking we should keep the status quo with https://review.opendev.org/c/opendev/system-config/+/767059 then update inventory when we update prod. That way we don't have an unexpected delta between prod and testing and weirdness in our host vars | 23:23 |
ianw | yeah, i guess that what i was doing was building extensively on the system-config tests, and found it quite confusing with the openstack.org server in the testing etc. | 23:23 |
clarkb | I don't think it is necessarily wrong, but it makes things different enough to be confusing | 23:24 |
fungi | it'll like | 23:24 |
fungi | ly be confusing either way ;) | 23:24 |
ianw | yeah | 23:25 |
clarkb | right but the previous was mathed production | 23:25 |
clarkb | so its the confusing we have to deal with :) | 23:25 |
ianw | the other thing is, we could push for the replacement server to clear this up | 23:25 |
clarkb | what does gerrit init --dev do? | 23:25 |
clarkb | ianw: we can do that as well :) | 23:25 |
clarkb | we will need to be careful turning it on to avoid having it replicate to gitea and such | 23:26 |
clarkb | but ya thats anotherthing to sort out | 23:26 |
ianw | clarkb: when the auth type is set to DEVELOPMENT_BECOME_ANY_ACCOUNT *and* you run gerrit init --dev, gerrit will create the initial admin user for you | 23:26 |
clarkb | ah both are required | 23:27 |
ianw | yes. it's slightly different to the quickstart stuff, that uses the upstream gerrit container. that includes a LDAP connected to it, where you have the initial admin | 23:27 |
fungi | another alternative would be to ssh as "Gerrit Code Review" using the host key and create an initial admin account with the cli | 23:28 |
clarkb | this is fine I had just never seen the --dev flag before | 23:29 |
ianw | fungi: i couldn't get that to work. i couldn't get that to make the initial account | 23:29 |
ianw | you can go in with that after you have an account, and suexec, but it can't create the initial account | 23:30 |
fungi | oh, create-user needs to run via suexec as an existing user? | 23:30 |
fungi | yeah, now i somewhat recall that | 23:30 |
ianw | it's been a bit since i tried, but using the "Gerrit Code Review" was my first attempt at doing it | 23:31 |
clarkb | ianw: thank you for https://review.opendev.org/c/opendev/system-config/+/767269/4 I had meant to do that but then things got crazy when we were slimming the jobs down | 23:32 |
*** DSpider has quit IRC | 23:32 | |
ianw | do we know off the top of anyone's head if we have enough headroom to launch another review server in dfw? | 23:37 |
clarkb | I don't. We would if we retired review-test (we can also clean up review-dev but its much smaller) | 23:37 |
ianw | ok, i'm happy to drive this one, i can give things a go and start on an etherpad of steps | 23:38 |
clarkb | thanks! I imagine the spin up for it would look a lot like review-test with a pruned host vars setup | 23:38 |
clarkb | that way it avoids replicating and stuff until we switch and add more config to it | 23:38 |
ianw | april seems far away but it isn't :) | 23:39 |
clarkb | (if you needexamples in recent history for doing the thing) | 23:39 |
clarkb | ianw: I'm into the bazel stuff and it looks like the pure js plugins don't get copied automagically to the war like the java plugins do? | 23:42 |
clarkb | hrm we also have to specify a different bazel target for the plugin. Any idea why the other plugins don't need this? | 23:42 |
ianw | clarkb: i think because they're default plugins? | 23:43 |
clarkb | ah | 23:43 |
clarkb | that makes sense | 23:43 |
fungi | need to do something like the copy i did in ansible for the pg plugin of the opendev theme? | 23:43 |
ianw | don't take anything i say about bazel as true though :) i would love for someone who actually understands it to look at it | 23:43 |
clarkb | fungi: ya and tell bazel to build the plgin explicitly | 23:44 |
clarkb | ianw: I bet that is it | 23:44 |
clarkb | and/or js vs java plugins | 23:44 |
fungi | oh, got it, so there's also a build step for that one | 23:44 |
clarkb | like maybe it can autodiscover java things but not the js | 23:44 |
ianw | it should probably grow to have a java component. what we'd like is for the summary plugin to be able to order the results via config; but the only way to really do that is to write a java plugin that then exposes a REST endpoint | 23:45 |
clarkb | re making room for new review. If we need to we can probably put review-test into cold storage and revive it again after if necessary (basically snapshot the root disk and its cinder volumes then delete the instance) | 23:45 |
clarkb | this new testing stuff also reduces the need for review-test (though testing the migration to 3.3 on review-test with its bigger data set would be nice hence the cold storage idea | 23:46 |
clarkb | worst case we just rebuild review-test entirely | 23:46 |
clarkb | ianw: the symlink thing with bazel is a fun one | 23:49 |
ianw | yeah, that's a great intersection of bazel and docker | 23:49 |
ianw | you can not convince bazel to not use the symlinks, and you can not convince docker to follow them | 23:50 |
clarkb | ok the rest of that stack lgtm. I did leave some nits and thoughts in a few places. You may want to double check them to make sure they are fine as is | 23:52 |
ianw | thanks, i'll go through soon. | 23:53 |
clarkb | neutron is running ~36 jobs per change in check and the vast majority look like expensive full integration style tets | 23:54 |
* fungi sighs | 23:55 | |
clarkb | neutron-tempest-with-uwsgi-loki | 23:55 |
clarkb | neutron-ovn-tripleo-ci-centos-8-containers-multinode | 23:55 |
clarkb | those are both failing non voting jobs | 23:56 |
clarkb | I wonder too if we've got a bunch of always failing non voting jobs in there :/ | 23:56 |
clarkb | fungi: I wonder if we need to talk to projects about taking a critical eye to tests like that especially if we're producing a large backlog as a result | 23:56 |
clarkb | https://zuul.opendev.org/t/openstack/builds?job_name=neutron-tempest-with-uwsgi-loki confirmed for at least that first job | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!