ianw | ok, it's been about 15 minutes and i can't see it recovering. should we restart the container? | 00:00 |
---|---|---|
fungi | back and catching up | 00:03 |
fungi | looks like the ssh api is also unresponsive? | 00:03 |
clarkb | fungi: we turned off apache | 00:04 |
ianw | ok, i can run "jstack" in the container a get thread backtraces | 00:04 |
clarkb | hoping it would settle down | 00:04 |
ianw | 91073 / 91074 are two top threads | 00:04 |
fungi | turning off apache wouldn't render the ssh api unresponsive too | 00:05 |
clarkb | ya that was what I realized afterwards | 00:05 |
clarkb | I suggested we wait until 00:00 UTC then restart | 00:05 |
clarkb | and we are now past | 00:05 |
clarkb | its not any happier either | 00:06 |
ianw | you need to convert the thread id to hex? | 00:06 |
clarkb | (but if ianw is thread dumping we can wait on that) | 00:06 |
fungi | looks like there was a spike in data being pulled from the database, and cpu/ram immediately shot up | 00:06 |
clarkb | we run db backups around 00:00 fwiw | 00:07 |
clarkb | so need to keep that in mind | 00:07 |
ianw | so nid in hex is the thread id | 00:07 |
fungi | the spike for eth1 was earlier than midnight tho | 00:08 |
clarkb | fungi: k | 00:08 |
clarkb | in that case a bad query possibly? | 00:08 |
clarkb | or bad queries | 00:08 |
clarkb | I think melody would show us the query fwiw | 00:08 |
clarkb | ianw's thread dump might too if it is that | 00:08 |
fungi | looks like the db utilization started around 22:55 and topped out 15 minutes later | 00:09 |
ianw | ok, top -H in the container gives the thread id's | 00:09 |
ianw | so 0xE is pegged at 100% | 00:09 |
fungi | yeah, and it looks like the jvm is maxxed out on its allowed memory | 00:11 |
ianw | "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007ff70c028000 nid=0xe runnable | 00:11 |
clarkb | ah | 00:12 |
ianw | http://paste.openstack.org/show/799105/ | 00:12 |
ianw | basically, i would say all the threads pegged at 100% are the GC threads | 00:12 |
clarkb | so theory time: some event happens that causes us to need many memories | 00:13 |
clarkb | then we fall over bceause we can't GC fast enough | 00:13 |
clarkb | we have had GC problems in the past | 00:13 |
clarkb | I don't think we'll learn anything new from the thread dump unless we want to dump all threads and hope for query type info maybe? | 00:14 |
clarkb | ianw: are you able to get memory allocation per thread somehow ? (that requires crawling pointers to the heap I'm betting) | 00:14 |
clarkb | likely that it won't just do that | 00:14 |
ianw | no; yeah all the threads stuck at 100% are the GC threads | 00:14 |
clarkb | I guess we restart then? | 00:14 |
fungi | i think so, yeah | 00:15 |
ianw | errit@review01:/$ jmap 7 | 00:15 |
ianw | Attaching to process ID 7, please wait... | 00:15 |
ianw | ERROR: ptrace(PTRACE_ATTACH, ..) failed for 7: Operation not permitted | 00:15 |
ianw | Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 7: Operation not permitted | 00:15 |
ianw | in theory that could give us heap knowledge etc, but ... i don't know | 00:16 |
ianw | permissions | 00:16 |
ianw | so, docker-compose down & up ? | 00:16 |
clarkb | ianw: googling says your jmap thing needs to run as the same user as the process that is running | 00:17 |
fungi | down, up -d | 00:17 |
clarkb | looks like you were gerrit but is the user gerrit2? | 00:17 |
clarkb | anyway ya its down then up -d | 00:17 |
ianw | hrm, i was in the container | 00:17 |
clarkb | oh in the container we call it gerrit got it | 00:17 |
clarkb | then ya I don't have other ideas | 00:17 |
ianw | restarting now | 00:18 |
ianw | apache back on now too | 00:19 |
clarkb | error_log doesn't show it is up yet fwiw | 00:20 |
*** DSpider has quit IRC | 00:20 | |
clarkb | and now it says it is ready. Usually takes apache a minute to catch up | 00:21 |
ianw | 2020-10-16 00:20:45,068] [main] INFO com.google.gerrit.server.config.ScheduleConfig : gc schedule parameter "gc.interval" is not configured | 00:21 |
ianw | dunno if that means anything | 00:21 |
ianw | oh that's git collection anyway | 00:23 |
clarkb | melody data doesn't seem to surive a restart iirc it never has but that may be a good thing to try and fix | 00:23 |
clarkb | fungi: was the previous restart a similar situation? java spinning the cpu and not doing much else? | 00:24 |
fungi | when was the previous restart? i need to jog my memory now | 00:24 |
clarkb | the 13th says logs I think | 00:25 |
clarkb | just a few days ago. I woke up one day and it had been restarted due to lack of responsiveness | 00:25 |
clarkb | https://review.opendev.org/#/c/628296/ spams the logs with mergability check errors | 00:26 |
fungi | http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-10-13.log.html#t2020-10-13T12:10:24 | 00:26 |
clarkb | maybe we should abandon that change just to clean up our error logs? | 00:27 |
fungi | i saw lots of "org.eclipse.jetty.io.SelectorManager : Could not process key for channel java.nio.channels.SocketChannel[connected local=/127.0.0.1:8081 remote=/127.0.0.1:47906]" | 00:28 |
fungi | in the error log | 00:28 |
fungi | this week's been so crazy i forgot we had a gerrit outage 3 days ago | 00:30 |
clarkb | we have had memory/gc issues in the past which I believe we solved by upgrading gerrit | 00:33 |
clarkb | old gerrit leaked | 00:33 |
clarkb | bnemec seems to do some chatty ssh queries too | 00:34 |
clarkb | they time out after 5 seconds though | 00:35 |
fungi | can someone with a little more insight into this outage #status log the restart so we don't forget the circumstances? | 00:43 |
clarkb | how about #status log Gerrit was restarted after it ran out of memory and spent all of its CPU cycles trying to garbage collect. | 00:44 |
clarkb | ianw: fungi ^ does that look accurate? | 00:44 |
ianw | #status log restarted gerrit container. cpu pegged and jstack of the busy threads in the container showed all were gc related | 00:45 |
openstackstatus | ianw: finished logging | 00:45 |
ianw | oh oops i meant to suggest that ... but, well there we go | 00:45 |
clarkb | the details seem to overlap | 00:46 |
fungi | wfm, thx | 00:46 |
corvus | fungi: i found this article very amusing, and i note that your nick seems to have broken my ability to read certain phrases without cognitive dissonance: https://www.npr.org/2020/10/15/923411578/a-disturbing-twinkie-that-has-so-far-defied-science | 00:50 |
corvus | "Eventually, all of us are food for fungi." | 00:50 |
clarkb | I'm going to go figure out dinner now | 00:50 |
clarkb | if we notice this again I think we should try and identify what the server is doing via melody if possible | 00:50 |
clarkb | if you click on the + details buttons for threads it gives you a lot of thread info | 00:51 |
clarkb | from that you may be able to infer what is busy and the source of the problem | 00:51 |
fungi | corvus: i appreciate the effect my nickname choice inflicts on others | 00:52 |
clarkb | just don't click on the red stop sign when you do that as it will kill the threads :) | 00:52 |
clarkb | oh also the dump threads as text gives you slightly different perspective too | 00:52 |
fungi | corvus: i have even more trouble reading that one because i have a long-time friend whose nick is "twinkie" | 00:52 |
clarkb | the text dump has tracebacks for each thread | 00:52 |
corvus | clarkb: yeah, i think you can get all the thread dumps in one page | 00:52 |
clarkb | but not the timing info so yo ucan kind of correlate between the two things | 00:53 |
clarkb | corvus: yup | 00:53 |
corvus | saving that would be ++ | 00:53 |
ianw | clarkb: you get pretty good info with jstack in the container | 00:56 |
clarkb | ianw: what does the invocation for that look like? | 00:56 |
ianw | exec -it /bin/bash into the container, then you can ps in there to get the java process in the container, then just jstack <pid> | 00:57 |
ianw | i think it's actually probably the same info the melody page has | 00:57 |
ianw | but yeah, with the spinning threads all just being the ~10 GC threads ... hard to say what drove it mad :/ | 00:58 |
ianw | i feel like i've done that before, pre-container days, and seen similar. we had a period of gerrit instability for a while where we restarted pretty frequently | 00:58 |
clarkb | yup, I think that was on an older version that had leaks | 00:58 |
clarkb | then when we got to 2.13 it was fine | 00:59 |
clarkb | maybe we've managed to tickle a new leak or similar problem in 2.13 | 00:59 |
ianw | i think, given plans, extensive investigation isn't worth it :) | 01:00 |
ianw | grabbing some lunch in the sun, bib | 01:01 |
clarkb | looking at cacti we seem to have high memory use a few times a year | 01:01 |
clarkb | ooh sun | 01:01 |
clarkb | our current memory use is above the previous baseline but not terrible | 01:01 |
clarkb | and ya I said I'd do dinner then didn't I'm really popping out now | 01:01 |
*** ysandeep is now known as ysandeep|afk | 01:28 | |
*** hamalq has quit IRC | 02:54 | |
*** hamalq has joined #opendev | 03:08 | |
ianw | hrm, i think openstackgerrit has disappeared again | 03:54 |
*** openstackgerrit has quit IRC | 03:57 | |
*** marios has joined #opendev | 05:14 | |
*** openstackgerrit has joined #opendev | 05:34 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible https://review.opendev.org/757660 | 05:34 |
ianw | fungi: ^ that is pretty much ready to go, modulo testinfra bits. so if you have any comments on the core of it feel free to make them | 05:36 |
*** tkajinam_ has joined #opendev | 05:46 | |
*** tkajinam has quit IRC | 05:49 | |
*** ysandeep|afk is now known as ysandeep | 06:12 | |
*** roman_g has joined #opendev | 06:18 | |
*** lpetrut has joined #opendev | 06:20 | |
*** hamalq has quit IRC | 06:20 | |
*** ykarel|away has joined #opendev | 06:26 | |
*** ykarel|away has quit IRC | 06:27 | |
*** tkajinam_ has quit IRC | 06:56 | |
*** tkajinam has joined #opendev | 06:57 | |
*** hashar has joined #opendev | 07:13 | |
*** andrewbonney has joined #opendev | 07:14 | |
*** slaweq has joined #opendev | 07:29 | |
*** ysandeep is now known as ysandeep|lunch | 07:35 | |
*** slaweq has quit IRC | 07:35 | |
*** hamalq has joined #opendev | 07:37 | |
*** hamalq has quit IRC | 07:42 | |
*** slaweq has joined #opendev | 07:58 | |
*** Tengu has quit IRC | 08:00 | |
*** mkalcok has joined #opendev | 08:02 | |
*** ysandeep|lunch is now known as ysandeep | 08:09 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: ensure-packer: ensure unzip in role to make ensure-packer self contained https://review.opendev.org/758535 | 08:15 |
*** hamalq has joined #opendev | 08:15 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: ensure-packer: ensure unzip in role to make ensure-packer self contained https://review.opendev.org/758535 | 08:16 |
*** slaweq has quit IRC | 08:17 | |
*** hamalq has quit IRC | 08:20 | |
*** Tengu has joined #opendev | 08:22 | |
*** Tengu has quit IRC | 08:36 | |
*** Tengu has joined #opendev | 08:43 | |
*** ysandeep is now known as ysandeep|afk | 08:45 | |
*** hashar has quit IRC | 08:58 | |
*** tosky has joined #opendev | 08:59 | |
*** slaweq has joined #opendev | 08:59 | |
*** hashar has joined #opendev | 09:01 | |
*** tkajinam has quit IRC | 09:36 | |
*** slaweq has quit IRC | 09:38 | |
*** ysandeep|afk is now known as ysandeep | 10:03 | |
*** hamalq has joined #opendev | 10:16 | |
*** hamalq has quit IRC | 10:21 | |
*** chandankumar has quit IRC | 10:25 | |
*** Tengu has quit IRC | 10:41 | |
*** Tengu has joined #opendev | 10:53 | |
*** DSpider has joined #opendev | 11:05 | |
*** hashar has quit IRC | 11:14 | |
*** slaweq has joined #opendev | 11:31 | |
*** slaweq has quit IRC | 11:39 | |
*** hashar has joined #opendev | 12:01 | |
*** hamalq has joined #opendev | 12:17 | |
*** hamalq has quit IRC | 12:22 | |
*** hamalq has joined #opendev | 12:53 | |
*** fressi has joined #opendev | 12:56 | |
*** hamalq has quit IRC | 12:58 | |
*** hashar has quit IRC | 13:13 | |
*** priteau has joined #opendev | 13:39 | |
fungi | growing up in a mountain forest, it's funny i never realized how much i took the abundance of rocks and dirt for granted until i relocated to a sandbar. making a quick trip to the garden center to buy some, back shortly | 13:52 |
*** slittle11 has joined #opendev | 13:53 | |
slittle11 | Please add me as the first core to the new group starlingx-snmp-armada-app-core. I can populate the rest of the members. | 13:55 |
*** slittle11 is now known as slittle1 | 13:55 | |
*** fressi has quit IRC | 14:02 | |
fungi | #status log added slittle11 as initial member of starlingx-snmp-armada-app-core group in gerrit | 14:05 |
openstackstatus | fungi: finished logging | 14:05 |
*** slaweq has joined #opendev | 14:06 | |
slittle1 | thank you | 14:07 |
fungi | you're welcome! | 14:07 |
*** hamalq has joined #opendev | 14:19 | |
*** ysandeep is now known as ysandeep|away | 14:21 | |
*** hamalq has quit IRC | 14:24 | |
zigo | Hi there! | 14:33 |
zigo | fungi: How can I get the CI try to run with amqp 5.0.1 ? | 14:33 |
zigo | We're in a bit of a dependency hell in Debian Sid, where I updated kombu, but that broke celery, which now needs to be updated, but then it require vine 5, which in turns needs amqp 5... | 14:34 |
zigo | So, I'd like to know if Victoria can be used with AMQP 5 too... | 14:34 |
clarkb | zigo that should be controlled by constraints | 14:36 |
zigo | clarkb: Sure, but in what repo? | 14:36 |
clarkb | requirements | 14:36 |
zigo | clarkb: If I push a change in the global-reqs, will it run some tests? | 14:36 |
clarkb | yes and you can use depends on | 14:37 |
clarkb | looks like the amqp constraint may be limited by something else as it is 2.6.1 so there may be abit of untangling | 14:39 |
zigo | clarkb: Thanks. | 14:41 |
*** slaweq has quit IRC | 14:46 | |
*** mkalcok has quit IRC | 14:52 | |
fungi | if you run the constraints generation build logs, it should say why 2.6.1 is getting chosen | 14:55 |
fungi | er, rather look at the constraints generation build logs | 14:55 |
fungi | actually it seems like the propose-updates periodic job for openstack/requirements doesn't log the generate-constraints output (or we don't collect where it's logged to) | 15:01 |
clarkb | does it go to the console log? | 15:01 |
fungi | not that i can tell | 15:02 |
fungi | though this is also worrisome: https://zuul.opendev.org/t/openstack/build/b156209eb3a845728ab26cd061dd7a40/log/job-output.txt#756 | 15:02 |
clarkb | aha 2.6.1 is only from july and 5.0 is the next release from last week | 15:02 |
clarkb | probably still something has a cap but we aren't super behind | 15:03 |
clarkb | fungi: my favorite part is it is already in the requirements repo :) | 15:04 |
fungi | zigo: looks like it's already proposed in https://review.opendev.org/750084 but openstack has been under a requirements freeze leading up to the victoria release two days ago | 15:07 |
zigo | fungi: That's fine, I just need to know if it would work or just break everything in Debian... | 15:08 |
fungi | zigo: reviewing the job results on that change would likely be a good start, though it's updating a lot of stuff coming out of the freeze period so the failures may not be related to new amqp | 15:10 |
fungi | zigo: also there's a #openstack-requirements channel where this might be more on-topic | 15:10 |
clarkb | another thing to do would be to check why they changed version number from 2.6.1 to 5 | 15:11 |
zigo | clarkb: They removed a bunch of Py2 compat code in amqp ... | 15:11 |
clarkb | dropped python2 and python3.5 and older support. stopped using ssl.wrap_socket | 15:12 |
zigo | All of this is Celery stuff, they moved all to version 5: vine, celery, amqp ... | 15:12 |
fungi | which is likely fine for openstack wallaby, we're not running any py27 jobs on requirements changes since a cycle already | 15:12 |
fungi | or py35 for that matter | 15:12 |
zigo | They pretend that the modules are independent, it's just not the reality. | 15:12 |
zigo | Yeah, Py 3.5 is old ... | 15:13 |
zigo | That's Ubuntu 16.04 / Debian Jessie ... | 15:13 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495 | 15:19 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496 | 15:19 |
*** lpetrut has quit IRC | 15:26 | |
fungi | ugh, static sites are slow again. i wonder if they're hitting another vhost on there now | 15:31 |
fungi | i can't get the mod_status details to load | 15:32 |
fungi | checking cacti graphs | 15:32 |
fungi | oh yeah, graphs show it's getting slammed again | 15:33 |
clarkb | looks like tarballs.opendev.org | 15:34 |
fungi | aha, ansible undid our filter on the tarballs site. adding it back manually | 15:34 |
fungi | yeah | 15:34 |
clarkb | did ansible maybe undo our fix | 15:34 |
fungi | vhost config was modified 07:07 utc today | 15:34 |
clarkb | on the mirror apache config cleanup the test that is failing is for dockerv1 api | 15:35 |
clarkb | dockerhub has completely depreacted that (no pulls) since middle of last year | 15:35 |
clarkb | I think we should just rm that proxy | 15:35 |
fungi | back in place now | 15:35 |
fungi | should clear up in a moment | 15:35 |
clarkb | I'll put a cleanup change for that under my ordery deny allow satisfy cleanup | 15:35 |
clarkb | fungi: when static settles can you look at the root screen on review test and check that my debugging of the manage-projects noop makes sense? | 15:36 |
clarkb | if so I'll do that file copy and we can rerun manage-projects there | 15:36 |
fungi | clarkb: yeah, i concur the cp command you have there makes sense given how the files are being mapped into the container | 15:36 |
clarkb | k I'll do that cp now | 15:37 |
fungi | should have fresh check results in for topic:ua-filter shortly and then hopefully we can speedily review it so ansible doesn't undo things again | 15:37 |
clarkb | ++ | 15:38 |
clarkb | does the manage projects command there look good and ready to go (in the root screen)? | 15:38 |
fungi | yeah, lgtm | 15:38 |
clarkb | https://review-test.opendev.org/admin/repos/clarkb/clarkb-test-project exists and https://review-test.opendev.org/gitweb?p=clarkb/clarkb-test-project.git;a=blob;f=.gitreview;h=0a41cd5553131a0ba126aed8644620c6ceab4713;hb=00fa933c012414a2f05fb7cf2a9c12a620aa9e79 lgtm | 15:40 |
clarkb | also it tested the change of default branch to main :) | 15:40 |
clarkb | I think we can cehck that off now as working. \o/ | 15:40 |
fungi | yep, perfect! | 15:41 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives https://review.opendev.org/758469 | 15:44 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove docker v1 registry proxy from our mirrors https://review.opendev.org/758585 | 15:44 |
clarkb | next I'll be testing a rename. Which I think the process is stop gerrit, move git repo, start gerrit and trigger online reindex | 15:45 |
*** hamalq has joined #opendev | 15:45 | |
clarkb | will push up a change before I do that otherwise we can't validate reindexing | 15:45 |
fungi | yup. i guess comment out the tasks for the gitea hosts | 15:45 |
clarkb | oh I wasn't going to use the playbook | 15:46 |
clarkb | but maybe I should | 15:46 |
clarkb | I think we essentially just rm all the extra bits in a rename since the renames already do a mv and reindex? | 15:47 |
fungi | mm, system-config-run-static and system-config-run-gitea did not like my filter changes | 15:48 |
fungi | i'll dig deeper | 15:48 |
fungi | Destination directory /etc/apache2/conf-enabled does not exist | 15:49 |
fungi | i guess the roles creating our apache tuning file make that dir | 15:49 |
clarkb | I believe that apache2.conf loads from that dir odd that the package wouldn't mkdir the dir if it is trying to load from it in its config | 15:50 |
fungi | ahh, we install apache in those other roles so it makes that dir for us | 15:51 |
fungi | i guess i should copy apache installation and module enablement into this role too | 15:51 |
fungi | or reconsider the choice to add it to the playbook and include it in the other roles instead. roles including other roles just seems like spaghetti to me | 15:52 |
fungi | but maybe it's preferable to duplicated tasks | 15:53 |
clarkb | it should largely noop if we duplicate the steps and that way you don't have to do prework | 15:53 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495 | 15:54 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496 | 15:54 |
fungi | okay, now the parent change also installs apache and enables mod_headers and mod_rewrite | 15:55 |
clarkb | I think I'll just do the testing by hand on review-test for now. Mostly concerned about whether gerrit can do it more than automating the specific process at this point | 16:02 |
clarkb | (mostly because its more straightforward and this week was a bunch of fires and I want to make progress) | 16:03 |
clarkb | https://review-test.opendev.org/c/clarkb/clarkb-test-project/+/755651 I had to manually add defaultbranch to the gitreview settings to chagne it from master. I'll make a jeepyb change to set that too I guess | 16:07 |
clarkb | the problem with this testing is my todo list only gets longer :P | 16:07 |
fungi | that is a general problem with todo list, in my experience | 16:09 |
fungi | i've never had one get shorter | 16:09 |
clarkb | I've renamed the project and confirmed that pre reindex you can't rely on the change number redirected urls | 16:12 |
clarkb | if you construct a url with the new project name in it that works | 16:12 |
clarkb | change reindex is running now and we'll have to wait and see if that fixes the change number url redirects (I expect it will) | 16:13 |
clarkb | hrm looking at the rename playbook we update project watches in the db. I guess I should test that with a rename too | 16:15 |
*** marios is now known as marios|out | 16:16 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Update static Apache configs to 2.4 ACL primitives https://review.opendev.org/758593 | 16:17 |
fungi | clarkb: ^ | 16:17 |
fungi | hopefully that's not going to break anything | 16:17 |
*** marios|out has quit IRC | 16:18 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Post gerrit upgrade rename playbook https://review.opendev.org/758594 | 16:22 |
openstackgerrit | Clark Boylan proposed opendev/jeepyb master: Set default branch in .gitreview files when creating project https://review.opendev.org/758595 | 16:26 |
fungi | argh, new error | 16:30 |
openstackgerrit | Clark Boylan proposed opendev/jeepyb master: Make local git dir creation optional https://review.opendev.org/758597 | 16:33 |
fungi | huh, ansible-lint takes a lot longer than i realized on our system-config repo | 16:34 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495 | 16:39 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496 | 16:39 |
fungi | forgot you need mod_macro to make use of macro directives | 16:40 |
clarkb | we consume jeepyb from source not releases right? | 16:40 |
clarkb | I think that is the case | 16:40 |
fungi | yes | 16:40 |
fungi | we haven't ever tagged jeepyb have we? | 16:40 |
fungi | yeah, no tags | 16:41 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Stop managing gerrit's local git mirror dir https://review.opendev.org/758598 | 16:42 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Clean up cron tab entry from ansible once removed from host https://review.opendev.org/758599 | 16:42 |
clarkb | https://review.opendev.org/758595 https://review.opendev.org/758597 https://review.opendev.org/758598 and https://review.opendev.org/758599 should all be fine to land on 2.13 (but as always careful review is appreciated) | 16:43 |
fungi | average runtime for tox-linters on our system-config repo is 20 minutes according to zuul's stats | 16:45 |
clarkb | reindexing changes was suspiciously quick, I assume they made it faster when it could infer things were already at the right index? | 16:46 |
fungi | i had heard it was faster in new gerrit | 16:46 |
clarkb | https://review-test.opendev.org/755651/ redirects properly now though. I'm going to watch the project and star the change, then rename it again and check that those elements update properly | 16:47 |
fungi | watches are still in the rdbms right? | 16:48 |
fungi | no, wait, that was file reviews | 16:48 |
fungi | so watches are in notedb now? | 16:48 |
clarkb | ya I beleive so | 16:49 |
clarkb | I just don't know where exactly which is why I want to test them with renames | 16:49 |
*** hamalq has quit IRC | 16:49 | |
fungi | sure | 16:49 |
clarkb | if they aren't in the project being watched then we may have to edit some git repos somewhere All-Users or All-Projects likely | 16:49 |
clarkb | I did find a mailing list thread from luca indicating that repo mv and reindex is all that should be necessary so I am hopefully this is all sorted out | 16:51 |
clarkb | pre reindex my watch hasn't updated | 16:52 |
zbr | fungi: if you call ansible-lint with xargs or use an outdated version, you will see bad performance. | 16:54 |
clarkb | accounts have reindexed and my watch hasn't updated. I suspect that these are stored in a central repo | 16:55 |
clarkb | will double check after changes reindex in case that is somehow a fix to the problem | 16:55 |
clarkb | fungi: on the UA filter change, you have two handlers one for reload and one for restart. You use restart even though loading the mod rewrite rules should happen on reload? Is that due to the addition of the modules? | 16:57 |
fungi | clarkb: yes, module addition needs a restart | 16:57 |
fungi | oh, though i guess i could just add that on the module tasks | 16:58 |
clarkb | fungi: ya I wonder if we should have module tasks do a loop then notify restart then the file change can notify reload | 16:58 |
clarkb | then if any module changes we restart and if the config changes we reload | 16:58 |
*** hamalq has joined #opendev | 17:14 | |
*** hamalq has quit IRC | 17:15 | |
*** hamalq has joined #opendev | 17:15 | |
*** mlavalle has joined #opendev | 17:16 | |
clarkb | changes have finished reindexing and the star is carried over but the watched project is not | 17:27 |
clarkb | is it terrible that I'm initially thinking: people can just update their watches :P | 17:27 |
clarkb | its been a long week. I'll dig into the notedb to see where that is stored soon | 17:27 |
fungi | infra-root: rackspace opened a ticket to let us know there's a disruptive database maintenance scheduled for ~2 weeks out impacting our "testmt" trove instance in dfw... looks like it was one set up to test percona clustering. doesn't seem like we're using it for anything, can i delete it? | 17:29 |
fungi | mordred: ^ if you happen to be around, you might know | 17:30 |
clarkb | ok All-Users:refs/userse/XY/ABXY/watch.config is where watches go now. ABXY is your account id number | 17:30 |
fungi | ouch | 17:31 |
fungi | that's not going to be easy to update | 17:31 |
fungi | basically have to check out the user-specific ref for every single user to check its watches, then push commits for any which match? | 17:32 |
clarkb | ya something like for id in user_ids: git fetch refs/users/id[:-2]/id && git checkout FETCH_HEAD && sed -i -e 's/"oldprojectname"/"newprojectname"/g' && git commit -a -m "rename project oldname to project newname" && git push gerrit refs/users/id[:-2]/id | 17:33 |
clarkb | maybe with a if git diff in the middle to check if we updated anything | 17:34 |
fungi | and we have how many thousand users to traverse? | 17:34 |
clarkb | 36k ish | 17:34 |
fungi | git checkout is not exactly fast | 17:34 |
fungi | yeah, not happening | 17:34 |
fungi | people can re-set their watches when projects rename, i guess | 17:34 |
* fungi doesn't set his watch since he never wears it | 17:35 | |
clarkb | ya I think we can basically say thats a known issue for now and if someone can make it fast they win | 17:35 |
fungi | git can cat a file from an arbitrary ref, so maybe no need to checkout unless we find a match | 17:36 |
clarkb | my earlier statement wasn't super clear the ref is refs/users/XY/ABXY and then in that working tree is a watch.config | 17:36 |
clarkb | fungi: ooh good idea | 17:36 |
clarkb | there may also be a gerrit api that we can ask for a list of users to modify (or even to do the modification for us) | 17:37 |
clarkb | there are rest apis for this | 17:40 |
clarkb | that may not be faster but is likely going to be easier to reason about? | 17:40 |
clarkb | https://gerrit-review.googlesource.com/Documentation/rest-api-accounts.html#set-watched-projects | 17:40 |
clarkb | 36k get requests then some smaller number of deletes and adds | 17:41 |
clarkb | I expect this is solevable but probably also no necessary before we upgrade | 17:41 |
fungi | clarkb: ianw: topic:ua-filter changes are passing tests now | 17:42 |
clarkb | fungi: do you want to update the base chagne to differentiate between when it should reload and when it should restart? | 17:42 |
clarkb | or do you want to land that as is and do a followup? | 17:42 |
fungi | clarkb: oh, yeah i can give that a shot | 17:42 |
*** andrewbonney has quit IRC | 17:43 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495 | 17:45 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496 | 17:45 |
fungi | clarkb: also not sure if you saw, but i pushed up 758593 to remove the satisfy all directives from the static site vhosts | 17:52 |
clarkb | Oh I hadnt will review both sets shortly | 17:53 |
fungi | also i think we've got more bitrot in pbr jobs | 17:55 |
fungi | looks like my changes which had been passing tempest-full and pbr-installation-openstack-pip-dev jobs started failing them a few weeks later once they were approved | 17:57 |
clarkb | pbr not working on focal for some reason maybe? | 17:58 |
fungi | i think so... the tempest job seems to be failing to install python-guestfs (it should probably be using python3-guestfs instead) | 17:58 |
fungi | the pip-dev job seems to be hitting a test timeout on pbr.tests.test_integration.TestIntegration.test_integration(octavia) | 17:59 |
clarkb | those integration tests install the various projects. That could be a regression in pip if its really slow | 18:00 |
clarkb | fungi: all the apache chagnes lgtm | 18:01 |
fungi | thanks | 18:01 |
fungi | ahh, also timing out on pbr.tests.test_integration.TestIntegration.test_integration(nova) | 18:02 |
clarkb | for the timouts I would try increasing the test timeout and see if it finishes in a reasonable amount of time. If not we may have to dig into a potential regression in performance there | 18:06 |
fungi | clarkb: looking closer, there's a tempest-full (which is failing to install python-guestfs instead of python3-guestfs on ubuntu-focal) and a tempest-full-py3 which is working | 18:08 |
fungi | i bet tempest-full should not have been moved to focal | 18:08 |
fungi | because it's doing python 2.7 testing | 18:08 |
clarkb | oh ya I think tehre is a change up to drop the python2 test | 18:09 |
clarkb | I suggested that it be replaced with a train python2 job instead | 18:10 |
clarkb | since really the intent there is to be able to continue to support installations for supported openstack versions with pbr | 18:10 |
clarkb | there was a realted change to switch to pre commit which I reviewed | 18:10 |
fungi | it looks like the tempest-full job should never have been moved to focal though... it declares "USE_PYTHON3":false | 18:11 |
clarkb | ++ | 18:11 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: More old apache acl cleanups https://review.opendev.org/758611 | 18:16 |
clarkb | fungi: ^ more apache cleanups | 18:16 |
fungi | thanks! i saw those in there too and figured they should be a separate change | 18:16 |
clarkb | ya gerrit is the last one with the old stuff that I see | 18:17 |
clarkb | but need more investigating to understand what those Directory blocks are even doing there | 18:18 |
clarkb | I think we can just rm them all | 18:18 |
fungi | maybe we can see if it breaks review-test and then just roll that into the upgrade changeset | 18:23 |
clarkb | ya that may be the safest route | 18:27 |
mordred | clarkb: I am 100% sure that anything labeled "testmt" can be deleted | 18:38 |
clarkb | fungi: ^ | 18:38 |
fungi | mordred: thank you!!! deleting now | 18:41 |
fungi | #status log deleted trove instances of "testmt" percona ha cluster from 2020-06-27 | 18:46 |
openstackstatus | fungi: finished logging | 18:46 |
fungi | is there a way to override the default branch for all requires-projects but not for the project triggering the build? | 18:56 |
fungi | specifically this is for trying to run tempest-full on pbr changes (pbr has only a master branch) but with stable/train of all the openstack projects (so as to be able to run with python 2.7)? | 18:57 |
clarkb | fungi: I think pbr will fall back ti master asit doesnt have stable branches | 18:57 |
clarkb | butmaybe that will error instead | 18:57 |
fungi | okay, so just declaring override-branch: stable/train will work maybe | 18:57 |
fungi | trying that now | 18:57 |
fungi | gmann is suggesting to do override-checkout instead, curious now to try both and compare the results | 19:05 |
gmann | i think override-branch just do running repo branch with overridden and rest all (including devstack) form master ? I partially remember this when ironic facing the issue when their stable branhc job on master gate were doing override-branch instead of override-checkout | 19:07 |
fungi | yeah, i suspect you're right | 19:10 |
fungi | override-branch is basically doing the inverse of override-checkout for this | 19:10 |
clarkb | fungi: shoudl we take the friday opportunity to land some of these apache changes and get them checked out / reverted if necessary? | 19:26 |
clarkb | I expect we'd have to wait for ianw's monday to get a second reviewer on them otherwise | 19:26 |
fungi | yeah, may as well. i'll take a quick look through yours while dinner finishes cooking | 19:28 |
clarkb | should I approve yours or do you want to do those in the same pass? | 19:29 |
fungi | clarkb: 758585 is failing ci | 19:30 |
clarkb | looking | 19:32 |
fungi | feel free to approve my stack, i already approved your misc one | 19:33 |
clarkb | it failed to restart apache | 19:33 |
fungi | which usually means a config error | 19:33 |
clarkb | maybe I need to remove my listen directives? | 19:34 |
fungi | looking | 19:34 |
clarkb | but the child change passed testing and ran the same job. I'll remove the listen directives for completeness but not sure thats related | 19:34 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove docker v1 registry proxy from our mirrors https://review.opendev.org/758585 | 19:35 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives https://review.opendev.org/758469 | 19:35 |
* clarkb goes to approve fungi's 3 changes | 19:35 | |
clarkb | though maybe we should approve the UA ones first then apply satisfy removal over the top | 19:36 |
clarkb | since we don't want to remove the ua filtering if satisfy removal merges and the other doesn't | 19:36 |
clarkb | ya I'll do it that way, the UA changes are approved once those land I'll approve the satisfy removal | 19:36 |
*** tosky has quit IRC | 19:37 | |
fungi | the apache error log for that host didn't have anything of note in it | 19:38 |
fungi | i want to say we've seen a similar potential race or other flakiness around letsencrypt where the cert file winds up not getting created, and the apache restart failure is a cascade error | 19:39 |
*** tosky has joined #opendev | 19:39 | |
*** slaweq has joined #opendev | 19:39 | |
fungi | we're basically calling on acme to do public api operations, it's not mocked | 19:39 |
*** priteau has quit IRC | 19:40 | |
clarkb | I just sent a followup to infra root and luca about gerrit things we've learned in case luca had input on any of them | 19:54 |
openstackgerrit | Merged opendev/system-config master: More old apache acl cleanups https://review.opendev.org/758611 | 19:56 |
fungi | thanks! | 19:59 |
openstackgerrit | Merged opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495 | 20:06 |
*** slaweq has quit IRC | 20:12 | |
clarkb | cacti's apache config just updated and I can still get graphs so nothing broke there | 20:20 |
clarkb | fungi: tarballs should be running nowish | 20:27 |
clarkb | if you confirm its happy when done I'll approve the static Satisfy any cleanup | 20:27 |
clarkb | it seems to have applied at least, not sure if functional yet | 20:33 |
clarkb | https://tarballs.opendev.org/ still loads anyway | 20:33 |
*** roman_g has quit IRC | 20:33 | |
*** roman_g has joined #opendev | 20:34 | |
clarkb | tailing the apache logs I don't see any of the sad making UAs | 20:35 |
clarkb | going to call that good and approve the Satisfy any cleanup | 20:35 |
fungi | yeah, it should have been a no-op since the change was already manually applied | 20:36 |
fungi | indeed, it corrected a trailing space on one line of the ua-filter.conf and didn't modify the vhost config at all | 20:38 |
fungi | so looks good | 20:38 |
*** roman_g has quit IRC | 20:38 | |
openstackgerrit | Merged opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496 | 20:42 |
openstackgerrit | Merged opendev/system-config master: Remove docker v1 registry proxy from our mirrors https://review.opendev.org/758585 | 20:42 |
openstackgerrit | Merged opendev/system-config master: Update static Apache configs to 2.4 ACL primitives https://review.opendev.org/758593 | 21:06 |
*** tosky has quit IRC | 22:29 | |
*** qchris has quit IRC | 22:41 | |
*** qchris has joined #opendev | 22:54 | |
*** mlavalle has quit IRC | 23:27 | |
fungi | clarkb: 758612 seems to be working for keeping tempest-full on pbr but running it with stable/train of projects | 23:51 |
fungi | that should get us back to being able to merge pbr changes again. i think the test timeouts i saw in the pip-dev job were signs of a "slow node" but we should keep an eye out | 23:52 |
clarkb | I'll take a look | 23:52 |
clarkb | lgtm, any idea if anyone elseis reviewing those? | 23:57 |
fungi | i've commented on the change for dropping the tempest-full job, and also tried to drum up interest in #openstack-oslo, so someone may notice | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!