clarkb | ya its py36 and newer which is newer than xenial | 00:00 |
---|---|---|
clarkb | smoething something we should just use nox | 00:03 |
clarkb | https://review.opendev.org/c/opendev/git-review/+/871652 :) | 00:03 |
clarkb | the afs grafana dashboard reflects the quota changes I made now | 00:05 |
ianw | clarkb: we are already setting up the verified tag in our test as a submit-requirement -> https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/bootstrap-test-review.yaml#L113 | 00:11 |
ianw | i can probably do something like edit the base file from gerrit to turn the code-review into a submit-requirement | 00:12 |
JayF | clarkb: if it makes a difference to you; I'll be out of town that day thru Thursday of the following week. | 00:12 |
ianw | or maybe add a verified tag with similar | 00:12 |
JayF | clarkb: although I don't think I have any magic knowledge that other Ironic cores involved (rpittau or iurygregory) wouldn't know | 00:12 |
ianw | but i'm wondering if it's worth us really maintaining that. does it really give us more coverage than the verified tag we have? | 00:12 |
clarkb | JayF: no, that just helps reinforce its likely to be a good quiet time | 00:13 |
clarkb | ianw: verified is probably fine. Looking at htat I thought I set the function to be noop/noblock but it isn't set there | 00:13 |
ianw | actually interesting. that should be able to push to 3.7 | 00:14 |
clarkb | ianw: https://review.opendev.org/c/opendev/system-config/+/872238 | 00:14 |
clarkb | ianw: I thought that had landed. Maybe we should switch to NoBlock and send it in? | 00:14 |
JayF | clarkb: when is that official enough I can tell ironic community? Now? | 00:15 |
ianw | clarkb: ++ | 00:15 |
clarkb | ianw: I think what I found was the checks for function only look if you set a function value. But if ou don't set one it doesn't check it and you get the default | 00:15 |
opendevreview | Clark Boylan proposed opendev/system-config master: Switch Gerrit test Verified label function to NoOp https://review.opendev.org/c/opendev/system-config/+/872238 | 00:15 |
clarkb | ianw: ^ | 00:15 |
clarkb | JayF: its probabl official enough now with the note that as we get closer if we aren't ready for some reason we'll delay | 00:15 |
ianw | clarkb: maybe just switch the NoOp/NoBlock in the subject before it runs ci? | 00:16 |
clarkb | but we've got a month to sort things out | 00:16 |
clarkb | ianw: I'm not sure I know how to parse that statement | 00:16 |
ianw | make the subject "Switch Gerrit test Verified label function to NoBlock" just to be accurate to the change i mean | 00:17 |
clarkb | oh sorry | 00:17 |
opendevreview | Clark Boylan proposed opendev/system-config master: Switch Gerrit test Verified label function to NoBlock https://review.opendev.org/c/opendev/system-config/+/872238 | 00:18 |
clarkb | This is what I get for using Gerrit's edit functionality | 00:18 |
clarkb | ianw: ^ done | 00:18 |
clarkb | fungi: I went ahead and approved the git-review change since I don't really know of anyone else regularly reviewing changes there | 00:19 |
ianw | ok, if we're in agreement there's probably not much point doing more work in the gate all-projects, i think getting probably fungi to just look over https://review.opendev.org/c/opendev/system-config/+/876237 and https://paste.opendev.org/show/brAj40R1mJbQZSXAXEQ5/ | 00:20 |
ianw | and then i can probably try pushing that to our all-projects at some time mid-morning for me, when it's quiet but not too quiet | 00:21 |
ianw | it should make no difference, but if it does we can revert quick and re-evaluate | 00:21 |
ianw | assuming it works, i'll feel better about moving on with the bigger updates to all the other acl's | 00:21 |
fungi | sure, taking a look at those now | 00:29 |
fungi | looks like it needs 876236 too | 00:29 |
fungi | which i just approved now since it's purely documentation edits | 00:30 |
fungi | approved both of them | 00:31 |
fungi | ianw: the diff in that paste lgtm, please proceed at your convenience! | 00:32 |
ianw | thanks! i think i'll try it tomorrow morning | 00:34 |
opendevreview | Merged opendev/git-review master: Upgrade testing to Gerrit 3.4.4 https://review.opendev.org/c/opendev/git-review/+/849419 | 00:38 |
opendevreview | Merged opendev/system-config master: doc/gerrit : update copyCondition https://review.opendev.org/c/opendev/system-config/+/876236 | 00:39 |
opendevreview | Merged opendev/system-config master: doc/gerrit : update to submit-requirements https://review.opendev.org/c/opendev/system-config/+/876237 | 00:41 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Provide deploy-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081 | 07:31 |
opendevreview | Saggi Mizrahi proposed opendev/git-review master: Add worktree support https://review.opendev.org/c/opendev/git-review/+/876725 | 08:02 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081 | 08:05 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081 | 08:27 |
*** jpena|off is now known as jpena | 08:39 | |
ianw | for reference the f37 sync finished and i've dropped the locks | 09:04 |
bbezak | Hi, I can't fetch any build in zuul - https://zuul.opendev.org/t/openstack/builds, and particular builds are "not found" | 09:42 |
bbezak | or "This build does not exist" | 09:45 |
bbezak | looks ok now | 10:23 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081 | 10:55 |
opendevreview | Radosław Piliszek proposed openstack/project-config master: Add the NebulOuS tenant https://review.opendev.org/c/openstack/project-config/+/876414 | 10:56 |
opendevreview | Saggi Mizrahi proposed opendev/git-review master: Add worktree support https://review.opendev.org/c/opendev/git-review/+/876725 | 12:34 |
*** odyssey4me is now known as odyssey4me_ | 13:05 | |
fungi | bbezak: there may have been a temporary object storage outage in one of the clouds we use to store job logs | 13:19 |
*** elodilles is now known as elodilles_afk | 13:20 | |
fungi | we rely on the swift services in rackspace and ovh public clouds for that, so if one of those was down then ~50% of recent builds would have had no data in the webui | 13:20 |
fungi | graphs for inmotion-iad3 look much better today. rax-ord is still rather rough though... it does sort of look like something, quite possibly us, is dragging down api response times there | 13:39 |
fungi | i suppose we can try cranking up the timeouts more, though that means potentially even longer that jobs have to wait to get node assignments | 13:40 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Wait longer for rax-ord nodes and ease up API rate https://review.opendev.org/c/openstack/project-config/+/876874 | 14:42 |
*** elodilles_afk is now known as elodilles | 14:58 | |
johnsom | FYI, meetbot seems to be not functioning in #openstack-lbaas today | 16:02 |
fungi | have you tried without an extra space before the #? | 16:10 |
fungi | it looks like both of you did " #startmeeting ..." instead of "#startmeeting ..." | 16:11 |
fungi | johnsom: ^ | 16:11 |
johnsom | fungi Doh! good catch, thanks. Cut/Paste error | 16:12 |
fungi | arguably meetbot could be improved to be a little more lenient with leading whitespace | 16:16 |
bbezak | thx fungi for explanation | 16:26 |
clarkb | fungi: can you review https://review.opendev.org/c/opendev/system-config/+/872238 as a followup to the testing of gerrit with our proposed acl changes discussion yestyerday | 16:42 |
clarkb | should be an easy one. Basically converts the verified label to a submit requirement in a similar manner to what will be done in prod on our test node | 16:43 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/876471 is the next gitea replacement related change. THis one will need a little bit of monitoring. Ithink I should be able to do that later today if you don't feel like approving | 16:44 |
fungi | oh, yep | 16:44 |
clarkb | I had previousl half converted the verifed label but then we realized not setting the function made it default to MaxWithBlock so the new submit requirement wasn't doing anything | 16:45 |
fungi | yeah, i sort of understand the problem there | 16:47 |
clarkb | and then https://review.opendev.org/c/opendev/system-config/+/876233 is another change that would probably be good to land before we upgrade gerrit. But the urgency here is low (this updates our gerrit image builds to use new gerrit tag versions for plugins but those are all equivalent to the old code except for one location which we were already overriding to something else that was | 16:47 |
clarkb | equiavlent | 16:47 |
fungi | and it's only for the test config anyway | 16:47 |
fungi | (the earlier change i mean) | 16:48 |
clarkb | ya | 16:48 |
clarkb | gitea09's load average is currently well over 8 (it has 8 vcpus) | 17:06 |
clarkb | my initial inclination is that implies we should add a couple more giteas | 17:06 |
clarkb | however, if all that load is coming form the same nat address then this might not help so I may need to look more closely at the logs before making that decision | 17:07 |
clarkb | but its good data for the question of whether or not we add more serves | 17:07 |
fungi | or fewer larger servers i guess | 17:09 |
clarkb | my local mapping is to gitea09 and it seemsto be snappy in my browser so this may not be as big a problem as I initially feared | 17:13 |
fungi | clarkb: not sure if you saw, but the next experiment i've proposed for rax-ord is 876874 | 17:14 |
clarkb | I missed that. Looking | 17:14 |
clarkb | approved. Seems like a reaosnable thing to test | 17:14 |
fungi | seems like we could be our own noisy neighbor problem there, but hard to tell still | 17:14 |
johnsom | fungi Yeah, I might propose a patch for that. I have two other things I need to look at first, but probably worth the effort. | 17:14 |
clarkb | ya we might want to drop max servers to like 10 and see if the other metrics become more normal | 17:15 |
clarkb | but lets see what being patient with boot times does | 17:15 |
opendevreview | Merged openstack/project-config master: Wait longer for rax-ord nodes and ease up API rate https://review.opendev.org/c/openstack/project-config/+/876874 | 17:23 |
clarkb | fungi: ^ the nodepool fix for slow restarts of the provider mnaager should have deployed yesterday to | 17:23 |
clarkb | this change will be a good exercise of that. | 17:23 |
fungi | oh, yep right | 17:24 |
clarkb | fungi: you good with me self approving https://review.opendev.org/c/opendev/system-config/+/874176 at this point? Considering all he nodes currently serving content are jammy now I think it is a good idea to land | 17:25 |
fungi | i went ahead and approved it | 17:28 |
clarkb | thanks! | 17:30 |
opendevreview | Merged opendev/system-config master: Switch Gerrit test Verified label function to NoBlock https://review.opendev.org/c/opendev/system-config/+/872238 | 17:31 |
clarkb | looking at gitea graphs for the other servers gitea11 is also quite busy and the others haven't been slouching either | 17:32 |
clarkb | I think this has me leaning towards booting another couple of servers | 17:32 |
clarkb | I've got an appointment this morning, but can kick that off later today | 17:32 |
*** jpena is now known as jpena|off | 17:33 | |
fungi | it doesn't look dire anyway | 17:33 |
clarkb | ya I don't think its dire, but does point to these servers maybe being a bit too little when demand is high | 17:33 |
clarkb | but I suspect replacing 8 smaller servers with 6 larger ones may be sufficient so I'll just do 2 I guess | 17:34 |
opendevreview | Julia Kreger proposed openstack/diskimage-builder master: WIP Correct boot path to cover FIPS usage cases https://review.opendev.org/c/openstack/diskimage-builder/+/876192 | 17:36 |
fungi | makes sense | 17:36 |
clarkb | The good news is I've got the process pretty well sorted out now. It shouldn't take long other than witing for gerrit replication steps to occur | 17:38 |
clarkb | lookingat cpu graphs I wonder too if this is due to iowait | 17:43 |
clarkb | in theory splitting this into more hosts will reduce io demands to a single VMs disk | 17:43 |
fungi | quite possibly | 17:47 |
fungi | doesn't really look like iowait to me | 17:49 |
fungi | at least not according to top | 17:49 |
clarkb | fungi: cacti reports up to 5% iowait on gitea11 ercently | 17:49 |
fungi | a lot of the reported cpu is "st" (time stolen from this vm by the hypervisor) | 17:50 |
clarkb | fungi: thats via top? | 17:51 |
fungi | yeah | 17:51 |
fungi | the "wa" column (time waiting for I/O completion) is comparatively low | 17:51 |
fungi | i'm guessing this is how modern kernels expose processor overcommit info | 17:52 |
clarkb | ya and thats out of 100% not 800% in top right? | 17:52 |
clarkb | adding more hosts should mitigate that too unless we end up on the same hypervisor? | 17:52 |
fungi | right. the 1 key will toggle per-processor view/percent | 17:52 |
clarkb | I'm not in a rush to add more hosts if we want ot monitor it more | 17:53 |
clarkb | I do need to pop out for that appointment now though. But let me know what you think | 17:53 |
fungi | now, this *could* be iowait on the hypervisor which doesn't show up as iowait on the guest, i don't really know | 17:53 |
fungi | well, i still agree that adding another couple of backends is good to try and see if it helps. we can always delete them again if we decide it didn't | 17:54 |
Clark[m] | ++ | 17:54 |
fungi | also we might be able to see if mnaser or guilhermesp can help us fine-tune things if it really is iowait hidden from us on the hypervisor side | 17:56 |
fungi | for example we might be able to isolate the i/o to more performant storage with a separate volume, or set up anti-affinity rules to make sure the backends wind up on different hosts, or whatever else makes sense | 17:58 |
fungi | might be this is a difference in disk throughput, since we switched from bfv to local storage for the rootfs | 18:01 |
fungi | in rax-ord/nodepool news, the config update deployed but i think nl01 is still running older code and will need a manual restart | 18:05 |
fungi | 2023-03-07 19:46:03,043 DEBUG nodepool.StateMachineProvider.rax-ord: Stopping | 18:05 |
fungi | and no "stopped" in the log | 18:05 |
fungi | so it's been in the process of stopping for about 22.5 hours | 18:05 |
fungi | i expect we'll need to down/up the launcher container like we did on nl02, but i'll wait until more folks are around since this isn't urgent | 18:08 |
fungi | clarkb: when you're back, could also pick your brain on git-review testing. apparently the output you had to update for changes in newer gerrit isn't stable, and the fields can be returned in an arbitrary order. in one test we were looking for "Processing changes: new: 1" but got "Processing changes: refs: 1, new: 1\nremote: Processing changes: refs: 1, new: 1\nremote: Processing changes: refs: | 18:17 |
fungi | 1, new: 1, done [...]" | 18:17 |
clarkb | ok that went quickly. | 18:19 |
clarkb | fungi: the fix for the nodepool thing still runs the long shutdown process but it does so in another thread and ignores it basically | 18:20 |
clarkb | fungi: so the stopped may not show up for a while but the new manager should run with the new settings anyway | 18:20 |
clarkb | the thing to look for is probably the boot timeouts to see if they are at 10 minutes or 15 | 18:20 |
fungi | ahh, okay | 18:21 |
clarkb | fungi: for git-review can you be more specific. The old side "Processing changes: new: 1" is still showing up? | 18:21 |
fungi | i don't think the boot timeout values get logged, but can likely be inferred from timestamps on a node that does time out | 18:21 |
fungi | clarkb: oh, i just realized that the failure is likely a test introduced in the change being reviewed which copied from existing tests prior to your update | 18:22 |
clarkb | ah ok. Ya the old stuff needs to be updated I think. But the new content should be stable from what I've seen so far | 18:22 |
fungi | i'm working to update it now, just mildly confused that it only triggered on the py39 job and not others | 18:23 |
clarkb | ok I'm booting gitea13 and gitea14 now | 18:25 |
fungi | i thought you were at an appointment? back already? | 18:26 |
clarkb | ya we switched it to an online appointment because one of the kid sis sick today and that cut out the travel time which made it go much more quickly | 18:26 |
fungi | got it | 18:27 |
clarkb | usually its half na hour of driver and half an hour of meeting. Today just half an hour of meeting | 18:27 |
clarkb | heh finally hit a quota exceeded error for instance count. I guess that means I'm going to delete gitea08 first | 18:28 |
clarkb | any objections to me doing that now? | 18:28 |
fungi | nope, go for it | 18:28 |
fungi | noticing git remote origin operations and browsing gitea are slow to respond, not sure if it's a problem local to me or the result of the servers being more heavily loaded | 18:29 |
clarkb | I think more of them are getting loaded so probably not just you. We can also reenable the old servers. I'll do that now | 18:30 |
clarkb | gitea01-04 have been reenabled in haproxy | 18:31 |
opendevreview | Julia Kreger proposed openstack/diskimage-builder master: WIP Correct boot path to cover FIPS usage cases https://review.opendev.org/c/openstack/diskimage-builder/+/876192 | 18:34 |
clarkb | looks like system load is falling. I'm slightly worried that we might be our own noisy neighbors here and that putting the old servers back helps because they are on different hardware | 18:36 |
clarkb | I'll proceed with gitea13 and 14 since that is what we can control, but maybe before deleting gitea01-07 (I have to delete 08 to make room) we try to clarify some of this behavior? | 18:37 |
clarkb | looks like a ton of cat-file commands which isn't normal fetching | 18:37 |
opendevreview | Merged opendev/system-config master: Convert gitea99 test node to Jammy https://review.opendev.org/c/opendev/system-config/+/874176 | 18:39 |
clarkb | fungi: looking at logs I think we're being crawled | 18:40 |
clarkb | and its through the load balancer so not something finding new webservers for the new giteas (yay non standard ports) | 18:41 |
clarkb | ok ya truning on the other ndoes definitely caused them to spike too | 18:54 |
clarkb | whats curious is they seem to have run more quickly with the load though that may just be due to better distribution | 18:54 |
fungi | yeah, could be a combination of crawler botnet hitting all the backends plus rh corporate nat getting balanced to one backend | 18:55 |
clarkb | I've got a quick sort and count going against the haproxy log to see if anything stands out | 18:55 |
*** odyssey4me_ is now known as odyssey4me | 18:56 | |
clarkb | but ya I guess I should delete gitea08 to free up room for gitea14. Then we can add two more and see where that gets us | 18:56 |
clarkb | being cautious to not delete the other old servers too soon | 18:56 |
*** odyssey4me is now known as odyssey4me_ | 19:04 | |
*** odyssey4me_ is now known as odyssey4me | 19:04 | |
opendevreview | Julia Kreger proposed openstack/diskimage-builder master: WIP Correct boot path to cover FIPS usage cases https://review.opendev.org/c/openstack/diskimage-builder/+/876192 | 19:05 |
fungi | clarkb: not urgent, but the git-review change also failed on this (unchanged from master): https://opendev.org/opendev/git-review/src/commit/0ecdd60a0a4864a44d042b997c2955b384e09a21/git_review/tests/test_git_review.py#L228 | 19:06 |
fungi | that leads me to suspect the messages may be nondeterministic since otherwise we'd expect it to have failed on your earlier change | 19:07 |
clarkb | ack | 19:07 |
clarkb | deleting gitea08 did not delete its volume | 19:11 |
clarkb | its bfv so that volume really only makes sense in the context of having the server so I will delete the volume now too | 19:11 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Block another bogus crawler botnet UA https://review.opendev.org/c/opendev/system-config/+/876889 | 19:12 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Block another bogus crawler botnet UA https://review.opendev.org/c/opendev/system-config/+/876889 | 19:12 |
clarkb | #status log Deleted gitea08 and its associated boot volume as part of gitea server replacements | 19:13 |
opendevstatus | clarkb: finished logging | 19:13 |
clarkb | fungi: comment on 876889 | 19:14 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Block another bogus crawler botnet UA https://review.opendev.org/c/opendev/system-config/+/876889 | 19:18 |
*** odyssey4me is now known as odyssey4me_ | 19:22 | |
fungi | clarkb: thanks, fixed and also commented with a reference to the apache docs which describe the = | 19:22 |
*** odyssey4me_ is now known as odyssey4me | 19:22 | |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Add CC similarly to reviewers https://review.opendev.org/c/opendev/git-review/+/849219 | 19:22 |
clarkb | +2 but I didn't approve yet. Maybe when that lands we should remove gitea01-04 again and see if we hold stable or not | 19:31 |
clarkb | gitea14 is almost done bootstrapping. I'll get changes up for DNS and adding them to the inventory shortly | 19:31 |
*** odyssey4me is now known as odyssey4me_ | 19:31 | |
*** odyssey4me_ is now known as odyssey4me | 19:32 | |
clarkb | anyone know why when I do `uniq -c` on a large number of records I have to first sort the input to get accurate results? Is uniq operating on really small buffers? | 19:33 |
clarkb | I wonder if we didn't OOM bceause we've got more memory now but we would've if we were still running the old servers | 19:36 |
clarkb | If that is the case then this ended up being a more graceful handling of things I guess | 19:36 |
*** odyssey4me is now known as odyssey4me_ | 19:37 | |
*** odyssey4me_ is now known as odyssey4me | 19:37 | |
clarkb | fungi: Internet seems to say steal is basically when the host isn't providing the resources expected. So ya we could be our own noisy neighbors here etc. I don't think this is due to IO though. Ist due to other things needing the cpu | 19:40 |
clarkb | it seems much higher on gitea09 and gitea11 than 10 and 12 | 19:42 |
fungi | clarkb: afaik, uniq only deduplicates adjacent lines | 19:43 |
fungi | i usually |sort|uniq -c instead | 19:44 |
fungi | uniq's manpage concurs with my recollection | 19:44 |
clarkb | ah | 19:46 |
clarkb | fungi: so gitea14 reuses gitea08's IP addr. Should I commit the removeal of gitea08 from dns as a separate change then add gitea14 in or just do it as one? | 19:46 |
clarkb | I don't think there is a difference to dns propagation as thats all ttl based | 19:46 |
fungi | i'd just do it in one change | 19:49 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Add gitea13 and gitea14 to DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/876891 | 19:50 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add gitea13 and gitea14 to inventory https://review.opendev.org/c/opendev/system-config/+/876892 | 19:52 |
clarkb | Once those land we wait for giteas to be deployed then I can do the brain transplants then we can set up replication | 19:53 |
clarkb | In the meantime we can leave the 4 old servers behind haproxy along with the 4 new ones | 19:53 |
fungi | sounds good | 19:53 |
clarkb | maybe drop the 4 old ones again when the apache rule lands to see if that helped | 19:53 |
clarkb | fwiw current cpu steal on gitea13 and 14 looks low like 10 and 12 | 19:55 |
fungi | anti-affinity rules could help then, if possible | 19:56 |
fungi | though 6-way anti-affinity might be tough on placement | 19:56 |
clarkb | I don't think we have access to that from the nova apis as a regular user | 20:06 |
clarkb | fungi: https://review.opendev.org/c/opendev/system-config/+/876889 has a +1 now. We can probably wait for ianw to take a look since the immediate fire has been dealt with | 20:07 |
fungi | yup | 20:07 |
*** odyssey4me is now known as odyssey4me_ | 20:15 | |
*** odyssey4me_ is now known as odyssey4me | 20:15 | |
*** odyssey4me is now known as odyssey4me_ | 20:17 | |
*** odyssey4me_ is now known as odyssey4me | 20:17 | |
opendevreview | Merged opendev/zone-opendev.org master: Add gitea13 and gitea14 to DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/876891 | 20:17 |
clarkb | ok other than being worried about cpu steal making the new servers potentially worse than the old ones I think I've run down what I can for now. I'm going to grab lunch | 20:18 |
Clark[m] | fungi: re git-review maybe it is deterministic based on the command you run? | 20:30 |
Clark[m] | fungi: probably the best thing is to just look for new: 2 or new:1 instead | 20:31 |
Clark[m] | We check separately for the success message and the commit titles. I think that checking the minimal count with those extra checks is sufficient | 20:31 |
fungi | well, the command we're running hasn't changed though | 20:32 |
fungi | at least i'm not sure why test_multiple_changes() would have suddenly started to fail on 849219 when it worked for your change with newer gerrit | 20:34 |
fungi | and also it only failed on tox-py39 but not 36,37,38 | 20:35 |
Clark[m] | Oh it failed. That's what I missed. I thought you were comparing to what I added at the end of the test which is the new version | 20:38 |
Clark[m] | Maybe ref: 1 only happens when git receives a new ref? | 20:38 |
Clark[m] | And could be a timing issue for that where other tests may have already written the commit to Gerrit it or something | 20:39 |
Clark[m] | But that info comes directly from Gerrit and git-review passes it through so I don't think we need to be super specific about it in the tests | 20:39 |
fungi | agreed, i think we can just simplify/shorten the strings we're looking for | 20:40 |
fungi | i'll whip something up | 20:40 |
Clark[m] | fungi: at the end of that multiple changes test I added asserts to look for both commit messages implying both were pushed as well as the SUCCESS message | 20:44 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Simplify test output strings for new Gerrit https://review.opendev.org/c/opendev/git-review/+/876894 | 20:44 |
*** odyssey4me is now known as odyssey4me_ | 20:47 | |
ianw | that UA patch looks right to me. yeah the "=" at the front means you can just paste in the full line, that's the idea | 20:47 |
opendevreview | Julia Kreger proposed openstack/diskimage-builder master: WIP Correct boot path to cover FIPS usage cases https://review.opendev.org/c/openstack/diskimage-builder/+/876192 | 20:48 |
clarkb | fungi: ya that change seems sufficient given we're still checking the count and that gerrit data is passed through | 21:06 |
clarkb | I've done more checking and gitea09+gitea14 share a hostId as do gitea11+gitea13. Unfortunate because those were the nodes with the cpu steal happening | 21:33 |
clarkb | 10 and 12 are distinct | 21:33 |
clarkb | so ya potential for being our own noisy neighbor and something to pay attention to I guess | 21:33 |
clarkb | ianw: I went ahead and approved the UA change | 21:37 |
clarkb | not sure if it ws intentional to not approve it. But you've got a few minutes to -W it if so | 21:37 |
fungi | i suppose we could keep relaunching servers until we get unique hostids across them all, but that seems borderline abusive and we should be conscientious stewards of these resources | 21:52 |
fungi | and also host migrations in the future might undo that without our noticing anyway | 21:53 |
opendevreview | Merged opendev/system-config master: Add gitea13 and gitea14 to inventory https://review.opendev.org/c/opendev/system-config/+/876892 | 22:00 |
ianw | i'm going to add myself to project bootstrappers and push the s-r change to all-projects as discussed yesterday. if it's not a complete no-op i'll revert | 22:07 |
ianw | ok, it's done | 22:08 |
ianw | change pages look ok | 22:08 |
ianw | the conditions all look about right to me | 22:09 |
fungi | lgtm, thanks for doing that! | 22:09 |
ianw | hrm, https://review.opendev.org/dashboard/self status says "n/a". not sure it did that before | 22:09 |
ianw | does anyone have an old page up (maybe don't refresh it just yet) | 22:10 |
opendevreview | Merged opendev/system-config master: Block another bogus crawler botnet UA https://review.opendev.org/c/opendev/system-config/+/876889 | 22:10 |
ianw | https://review.opendev.org/q/project:opendev%252Fsystem-config looks right. it's got the crossed grey circle | 22:11 |
ianw | and says like 2 (of 3) about the SR | 22:11 |
clarkb | I pulled up my old page but it seemed to autorefresh | 22:12 |
clarkb | it says n/a there | 22:12 |
clarkb | ianw: another check would be trying to vote on something you don't have rights to | 22:12 |
clarkb | that looks fine for me | 22:12 |
clarkb | (I pulled up a random nova change and can onl +/-1 code review) | 22:13 |
ianw | https://imgur.com/a/2kLCir5 ... some have n/a and some don't. trying to see if there's a pattern ... | 22:13 |
ianw | maybe it's if all SR are unsatisfied? | 22:14 |
ianw | https://review.opendev.org/c/zuul/zuul-jobs/+/871679 is one for example. it has a +1 one from zuul | 22:15 |
clarkb | ah ya I've got some that say 2 of 3 that have code-review +2 and verified +1 but lack the workflow +1 | 22:15 |
clarkb | oh wait its 2 unsatisfied of 3 | 22:15 |
ianw | but then https://review.opendev.org/c/opendev/system-config/+/873214 does as well, and that i see without n/a | 22:15 |
clarkb | since verified also requires a +2 | 22:15 |
clarkb | ianw: could it be an indexing thing maybe? | 22:16 |
clarkb | ianw: perhaps make a comment only (no vote) update to a chagne and see if it toggles | 22:16 |
clarkb | I believe any action like that will cause the change's index content to be updated | 22:17 |
ianw | yeah, i don't think we can really get around that. we're basically never having the s-r satisfied for "humans" in our case | 22:17 |
clarkb | ya | 22:17 |
clarkb | I don't think its a big deal | 22:17 |
clarkb | and the n/a doesn't seem to be a problem either. Just a matter of understanding why I guess | 22:17 |
ianw | yeah, i think that's it | 22:19 |
ianw | i commented on https://review.opendev.org/c/zuul/zuul-jobs/+/871539/21 and now it shows up as 0 (of 3) | 22:19 |
clarkb | cool that will sort of naturally correct itself for active changes | 22:20 |
ianw | (btw, that little stack probably wants reviews. i'm a bit stuck on it because it seems like submodules have weird semantics, and i'm not sure what zuul should do with them) | 22:20 |
ianw | ++ i think that's right | 22:20 |
clarkb | ianw: I've been deferring to corvus on the submodule stuff as I think he groks it reasonably well (and I do not) | 22:21 |
fungi | i break out in hives when i see a submodule | 22:21 |
ianw | #status log switched Gerrit ACL's to submit-requirements. You may see a status of "n/a" in https://review.opendev.org/dashboard/self, this should resolve as changes are updated and reindexed by Gerrit | 22:22 |
opendevstatus | ianw: finished logging | 22:22 |
ianw | do you think it's worth filing a bug? i wonder if the reindex can be forced? | 22:23 |
fungi | you can still reindex the way we used to during upgrades, right? | 22:23 |
clarkb | yes you should be able to | 22:24 |
clarkb | trigger an online reindex | 22:24 |
fungi | `gerrit index start changes --force` according to my notes | 22:24 |
fungi | to refresh the changes index | 22:25 |
clarkb | thats a gerrit ssh command (different than the offline gerrit way reindex command) | 22:25 |
ianw | would this reindex the right things for this though? if that makes sense ... basically would it fix the problem | 22:25 |
clarkb | I think the idea is it reindexes everything and should. But you're right we won't know without testing. I don't know if you can limit that command to a specific change or projet | 22:26 |
clarkb | but really I don't think it is a big deal | 22:26 |
clarkb | the reason I don't is that those values will never mean that gerrit upstream inteded for us due to zuul hitting the +2 and submit button | 22:26 |
clarkb | they show you that so you can go and hit submit manually on any chagnes of yours that are ready | 22:27 |
ianw | yeah, i'm not even what to suggest about that | 22:28 |
ianw | even if verified was some sort of "automated submit-requirement", you still need to add the +w to get things going | 22:30 |
ianw | i guess really we want things to be marked as "ready for commit" when they just have a +2 code-review | 22:31 |
ianw | but even then it's really "ready to signal to zuul that gate testing can be done" which just doesn't seem to fit the model at all | 22:31 |
clarkb | I think +2 code review and +1 verified are the rough equivalent | 22:32 |
clarkb | because ya that tells reviewers "this is one step away from merging" | 22:32 |
ianw | well at least we can point to our live system now, although it does feel like it not affecting google means we might have to live with it or employ a full time java stack developer | 22:35 |
clarkb | we might be able to convince them to remove the collumn with a config option but ya actually changing behavior would likely require someone towrite the change(s) | 22:36 |
hashar | I was about to ask why https://www.opendev.org/ does not exist and I found the fixed link in this channel topic: no www! :D | 22:36 |
clarkb | we could add a redirect for it but meh | 22:37 |
hashar | don't bother, Gerrit submit requirements is definitely more important :] | 22:37 |
clarkb | hashar: its the last major piece in prep for our 3.7.x upgrad | 22:38 |
clarkb | I think that will be the first time in like a decade we will be running the latest release of Gerrit | 22:39 |
clarkb | then 3.8 will release a month later and we'll be behind but that is ok | 22:39 |
hashar | :q | 22:39 |
hashar | for the reindexing, you can pass a change number and it will only index that one (`gerrit index changes 1234`) | 22:40 |
hashar | and I think there is a command to do it on a per project basis | 22:40 |
hashar | if you can find sometime to write a report of your 3.7 upgrade on upstream list, I am sure it will benefit others in the future | 22:41 |
opendevreview | Merged opendev/git-review master: Simplify test output strings for new Gerrit https://review.opendev.org/c/opendev/git-review/+/876894 | 22:42 |
clarkb | hashar: thats good feedback. That said every upgrade since 3.2 has been straightforward. I think 3.7 prescirbes an offline reindex though so may be the most complicated one since | 22:42 |
clarkb | hashar: we also test the upgrades with every change to our gerrit config management | 22:43 |
clarkb | hashar: our CI builds the current version that we deploy to prod and the next version and we have a job that deploys the current version and then upgrades it to the next one | 22:43 |
clarkb | that has been useful to sanity check gerrit isn't adding things to our config file(s) and so on | 22:43 |
hashar | ahhh | 22:45 |
hashar | I am lagging a few years behind though, I still copy paste a few commands but have some bits to automate those | 22:45 |
hashar | then , we still run on baremetal rather than container images | 22:45 |
clarkb | hashar: we manually do upgrades in prod still though we could theoretically adapt what we do in testing to prod too | 22:45 |
clarkb | but it has helped us build a lot of confidence in the newer versions and how to get to them | 22:46 |
ianw | i feel like we may be quite unique in keeping our ACL's separate and git-ops-ing them | 22:46 |
hashar | and we have Puppet on top of that for config management which adds another layer of fun :] | 22:46 |
ianw | even with the migration tool, it says "we will run this at google" in the CL ... implying that they just update the ACL's in gerrit but don't push them externally | 22:46 |
hashar | the ACL management is main complaint. We have locked them down and left them untouched for the most important/sensible repo (in short only admins can do anything about it) | 22:47 |
hashar | but for the rest, it is a bit of a mess :\ | 22:47 |
ianw | that's kind of the reason why we have to do more work with submit-requirements, because we have to migrate our stuff in project-config | 22:47 |
clarkb | thats a good point. Otherwise we'll be out of sync with our external view of the acls | 22:47 |
ianw | (put in https://bugs.chromium.org/p/gerrit/issues/detail?id=16748 about the n/a thing. if someone who knows says "yes, run XYZ" i'll propose a docs update) | 22:48 |
clarkb | ianw: speaking of 3.7.x https://review.opendev.org/c/opendev/system-config/+/876233 is a totally not urgent update to our gerrit image builds to sync up with latest plugin versions | 22:51 |
ianw | ++ | 22:52 |
ianw | seeing as the s-r all-projects has gone well enough, i'm more confident to work my way through https://review.opendev.org/q/topic:gerrit-s-r-3.7 now | 22:53 |
ianw | i remember https://no-www.org/ going around "at the time". in my mind "at the time" was about 2 years ago. that is dated ... 2003 | 22:59 |
ianw | i think i'm the dated thing in this story :) | 22:59 |
clarkb | if we wanted to add it we would need to update our ssl certs but then we could add a simple redirect to apache | 23:00 |
ianw | yeah, i'm not fussed. i just remembered a time when it was a thing people were talking about | 23:01 |
clarkb | when you manually provision certs from namecheap (what we did before LE) they automatically added www for you | 23:03 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Add CC similarly to reviewers https://review.opendev.org/c/opendev/git-review/+/849219 | 23:15 |
opendevreview | Julia Kreger proposed openstack/diskimage-builder master: WIP Correct boot path to cover FIPS usage cases https://review.opendev.org/c/openstack/diskimage-builder/+/876192 | 23:15 |
opendevreview | Merged openstack/project-config master: gerrit/acl : remove deprecated copy conditions https://review.opendev.org/c/openstack/project-config/+/867931 | 23:16 |
opendevreview | Merged openstack/project-config master: gerrit/acl : handle submit requirements in normalise tool https://review.opendev.org/c/openstack/project-config/+/875992 | 23:16 |
ianw | ^ hrm i guess that has to deploy behind https://review.opendev.org/c/opendev/system-config/+/876892/ which is running everything | 23:19 |
clarkb | oh sorry about that. But ya udpating the hosts inventory file in particular makes all the things go | 23:20 |
clarkb | ianw: that said I think manage-projects always pull project-config from master so the manage-projects job for 876892 may run it for yo? | 23:21 |
clarkb | definitely worth checking after that run is done | 23:21 |
ianw | yeah good point, i'll watch that closely | 23:21 |
ianw | (no need for sorry; sorry i haven't got them running in parallel yet as well :) | 23:22 |
ianw | i still think that's close but i've probably got a day of context switching it back in | 23:22 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!