clarkb | fungi: we both responded to the networking-ovn documentation question. Note that I think the docs originated in a retired project that should've retired its docs before retiring the project | 00:01 |
---|---|---|
clarkb | fungi: the cloud launcher failed on the same you need to authenticate error I've been seeing. I don't see those errors now using the new cloud profile. Maybe we didn't use the correct cloud profile in the run launcher config? | 00:08 |
clarkb | no that seems to be correct | 00:09 |
fungi | yeah, i'm in the process of troubleshooting it | 00:09 |
fungi | opendevci-rax-flex is working with the clouds.yaml on bridge, but opendevzuul-rax-flex is not | 00:10 |
fungi | possible i didn't prime that one correctly | 00:10 |
clarkb | oh it specifically failed on opendevzuul | 00:10 |
clarkb | yup I think that must be it | 00:11 |
ianw | i think perhaps i/we never really got around to a full cleanup after https://review.opendev.org/c/opendev/system-config/+/820250 | 00:16 |
clarkb | oh that could explain it. | 00:17 |
fungi | clarkb: found it. i accidentally reused the project_id from opendevci-rax-flex for opendevzuul-rax-flex, fixed in the private hostvars just now | 00:18 |
clarkb | My immediate concern is being able to add new hosts to the inventory and have ansible work. But revisiting this effort you started is probably worth doing when there is time | 00:18 |
clarkb | fungi: cool | 00:18 |
fungi | i guess the daily run will get it in few hours | 00:24 |
fungi | i went ahead and reenqueued 942230,4 in deploy instead | 00:33 |
fungi | it's corrected clouds.yaml on bridge now | 00:46 |
fungi | cloud-launcher job is running again now | 00:47 |
fungi | success! | 00:56 |
Clark[m] | Excellent | 00:57 |
fungi | so i guess next we need to upload our noble image to both regions | 01:04 |
fungi | where did ubuntu-noble-server-cloudimg-2024-08-22 get uploaded to our old project in flex sjc3 from? | 01:13 |
Clark[m] | fungi: tonyb uploaded it from bridge using the current upstream Ubuntu cloud image | 01:38 |
Clark[m] | But the file is no longer there. Only the vhd is there and I'm not sure that we can convert it back to raw or qcow2 | 01:39 |
Clark[m] | I think we just grab the current noble image and upload it as qcow2 or raw depending on which is preferred | 01:39 |
fungi | k | 01:46 |
fungi | can do | 01:46 |
opendevreview | Ian Wienand proposed opendev/system-config master: add-inventory-known-hosts: lookup from Zuul checkout https://review.opendev.org/c/opendev/system-config/+/942333 | 03:17 |
opendevreview | Ian Wienand proposed opendev/system-config master: add-inventory-known-hosts: lookup from Zuul checkout https://review.opendev.org/c/opendev/system-config/+/942333 | 03:53 |
opendevreview | Ian Wienand proposed opendev/system-config master: add-inventory-known-hosts: lookup from Zuul checkout https://review.opendev.org/c/opendev/system-config/+/942333 | 04:25 |
slittle | oops, I pushed a bad tag. I need to delete tag 10.0.0 in git starlingx/manifest | 15:44 |
fungi | deleting a tag from the repository won't clean up anywhere copies of it have replicated to, or local copies in repositories where someone has pulled | 15:49 |
fungi | also if you replace that tag with the same version later, other copies of the repository will hang onto the old one rather than pulling in the new one | 15:50 |
slittle | I know. I hope to delete it before it gets cloned | 15:50 |
fungi | we document that tag deletion is essentially impossible, at a minimum there will likely be stuck references to it in our ci systems and elsewhere that would have to be manually cleaned up afterward to prevent errors | 15:51 |
slittle | ah | 15:52 |
fungi | infra-root: ^ any opinion on whether we can try deleting that, or is the impact risky? | 15:52 |
clarkb | I think zuul does try to update things, but I don't know how successful it would be | 15:53 |
clarkb | I think the risk is largely to starlingx/manifest | 15:53 |
fungi | usually we'd recommend leaving it, and pushing a 10.0.1 with a release note indicating that 10.0.0 was pushed in haste with the wrong content rather than trying to alter history | 15:53 |
fungi | but i'm up for trying it if we think it's not going to be too disruptive for our systems, as long as the starlingx community is okay handling the possible impact there | 15:54 |
slittle | I think we are ok on our end | 15:55 |
fungi | okay to sort out any impact from wrong copies of a 10.0.0 tag in that repo? in that case i'll go ahead and escalate my perms while i give other infra-root sysadmins a few minutes to chime in | 15:56 |
slittle | yep | 15:57 |
clarkb | ya I think beacuse gitea replication is a force push we may update those properly. But I'm not positive of that. My main concern would be that a new 10.0.0 is tagged but when pushed zuul operates on the old hash because it has already fetched it | 15:58 |
fungi | https://review.opendev.org/admin/repos/starlingx/manifest,tags shows a 10.0.0 tag you pushed for revision 906d3d25369e8b0a87d2c1451fc5941709bcd59e at 15:35:16 utc | 15:58 |
clarkb | I don't expect any of our tools (gerrit, zuul, gitea, etc) to break but their outputs may become incorrect for that repo | 15:58 |
slittle | should point to 73e57878ddc6b05d50a11c709310a0d52a5100cd | 15:58 |
fungi | last call for objections to me deleting that tag | 15:59 |
fungi | if/when you push a new one, you'll want to pay extra close attention to any jobs that run on or infer things from tags, to make sure they used the "right" 10.0.0 | 16:00 |
slittle | yes | 16:00 |
fungi | #status log Deleted incorrect starlingx/manifest tag 10.0.0 for revision 906d3d25369e8b0a87d2c1451fc5941709bcd59e at slittle's request in #opendev | 16:01 |
opendevstatus | fungi: finished logging | 16:01 |
clarkb | keep in mind it is possible for one job to use the right hash and another to use the wrong one if the tag was only fetched on a subset of executors/mergers previously | 16:02 |
fungi | slittle: if you spot anywhere that the old tag ends up used, let us know, i can try to help with any needed cleanup in our systems | 16:02 |
clarkb | you need to check every job that runs for the new tag | 16:02 |
clarkb | you can't just check one location | 16:02 |
slittle | thanks guys | 16:02 |
clarkb | this assumes zuul can't sort it out properly which I'm reptty sure it attempts to do | 16:02 |
fungi | fwiw, it doesn't look like the deletion replicated to the gitea servers, so e.g. https://opendev.org/starlingx/manifest/src/tag/10.0.0/ is still present | 16:04 |
fungi | i guess we'll see if the new tag overwrites it in gitea later | 16:04 |
slittle | what does it point to? I pushed the corrected tag | 16:05 |
clarkb | infra-root when you get a chance can you take a look at https://review.opendev.org/c/opendev/system-config/+/942307 ianw's comments on that change and his suggested alternative in 942333. I'm fine with the alternative myself but need to properly review it still. corvus you may be interested as it applies executor state to things | 16:05 |
clarkb | slittle: click the link and look | 16:05 |
fungi | oh, maybe i checked after the new tag had already replicated | 16:05 |
clarkb | https://opendev.org/starlingx/manifest/commit/73e57878ddc6b05d50a11c709310a0d52a5100cd it points there now | 16:05 |
clarkb | whcih seems to match the correct hash above | 16:06 |
fungi | yeah, so maybe i didn't realize there was a new tag already when i pulled it up | 16:06 |
clarkb | so ya the force push behavior of replication to gitea addresses the problem whcih matches my expectation. That laeves zuul as the big question | 16:06 |
slittle | yes, it's good | 16:06 |
corvus | clarkb: yeah i was just going through all of that. | 16:07 |
clarkb | slittle: fungi: taking a quick look at zuul that repo doesn't appear to run any tag jobs? So maybe it is a noop? | 16:11 |
clarkb | I guess there may be downstream jobs that rely on that tag though | 16:11 |
clarkb | downstream of the tag pipeline but still within zuul I mean | 16:11 |
fungi | frickler: it looks like you might have left your frickler.admin account as a member of the Project Bootstrappers group in gerrit, okay for me to remove it or are you in the middle of doing something with it? | 16:12 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Reduce tracing.o.o's CNAME TTL https://review.opendev.org/c/opendev/zone-opendev.org/+/942376 | 16:15 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Switch tracing.o.o CNAME to tracing02 https://review.opendev.org/c/opendev/zone-opendev.org/+/942377 | 16:15 |
clarkb | 942376 there is straightforward so I'm going to approve it now | 16:15 |
corvus | clarkb: i'm having a hard time deciding between your approach and ianw's. i sort of feel like "make the source repo on bridge up to date" be one of the first things that happens in bootstrapping the whole process sounds like a good idea, so why not that? is there a downside to that? does ianw's approach somehow facilitate parallel runs better? | 16:15 |
fungi | i think mainly we have to make sure that if several system-config changes merge together before a deploy job runs, that we're okay with the deploy jobs for an earlier change using a newer state of system-config. but also we need to be careful to make that safe anyway because periodic deploy jobs can also happen between merge and deploy | 16:17 |
clarkb | corvus: I think ianw really wanted to keep that bootstrap job to a minimal set of tasks. Updating known_hosts to a good state definitely falls into that but I guess whether or not repo updates does is more ambiguous. I think part of the reason for the swap is that things should depende on infra-prod-base hwich can't be minimal because it runs ansible across all the nodes. This is a | 16:18 |
clarkb | dependency because it bootstraps base things for our servers which should be in place before running services. This means you have two dependencies (bootstrap-bridge and infra-prod-base with base also dependingon bootstrap-bridge). Things get complicated later when setting up dependencies beacuse I don't think they are transitive when using soft dependencies? | 16:18 |
clarkb | fungi: both proposals should use the state from the executor which should be specific for the change | 16:18 |
clarkb | corvus: anyway I suspect that part of this boils down to managing the job dependencies further down the line and keeping that simple while still accomplishing the goal of minimal bridge setup upfront | 16:19 |
fungi | mmm, is using state from the executor safe? periodics will potentially use newer state between merge of a change and deploy jobs for that change, so you could have a regression of state in that case | 16:19 |
clarkb | the more I think about it the more I think it probably is 6 one way half a dozen the other? | 16:19 |
clarkb | fungi: I think that concern is valid but I don't think that is a new issue caused by these changes | 16:20 |
clarkb | however zuul should respect enqueue order with the locks I think so we'd still have the right order? | 16:20 |
clarkb | corvus: fungi: maybe we try to sync up with ianw later today to make sure there isn't something we're missing? | 16:22 |
fungi | i guess we enqueue to deploy immediately on merge, not after a separate promote | 16:22 |
corvus | yes deploy is change-merged, so it's change-specific | 16:22 |
fungi | yeah, change-merged https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml#L229 | 16:23 |
frickler | fungi: oops I must have missed that when I last force-merge something, nothing in progress, so feel free to clean up or I can do it, too | 16:23 |
fungi | frickler: done, thanks! | 16:23 |
clarkb | eventually we will need to update the git repos and the end state we're aiming at is to do so once per buildset. Whether we do that update in infra-prod-bootstrap-bridge or infra-prod-base (or a third new job) probably doesn't matter too much. The biggest difference we'll notice is in the job dependencies I think | 16:25 |
fungi | right, so as long as periodics can't run with newer state than things enqueued in deploy (because they'll get the semaphore lock in event order), that does seem safe | 16:25 |
clarkb | so I'd prefer something that makes job dependencies simplest but I think we can make it work either way | 16:25 |
frickler | slittle: while you're around, may I remind you of https://zuul.opendev.org/t/openstack/config-errors?project=starlingx%2Fzuul-jobs&skip=0 once again? | 16:26 |
opendevreview | Merged opendev/zone-opendev.org master: Reduce tracing.o.o's CNAME TTL https://review.opendev.org/c/opendev/zone-opendev.org/+/942376 | 16:27 |
fungi | looking at when i uploaded a noble cloud image to our old control-plane tenant in flex sjc3, i ended up specifying --property hw_disk_bus='scsi' --property hw_scsi_model='virtio-scsi' | 16:35 |
clarkb | that was to fix the disk issue we had right? I thought that rax thoguht they had fixed ti generally? | 16:36 |
opendevreview | Brian Haley proposed openstack/project-config master: Charms: add review priority to charms repos https://review.opendev.org/c/openstack/project-config/+/942381 | 16:42 |
clarkb | I don't think using scsi will hurt and we can always test dropping it with a different image if we just want to do the thing that is expected to work the first time | 16:42 |
fungi | yeah, i'll see what happens if i leave those out this time | 16:51 |
fungi | and right, it was because the swap and ephemeral block devices ended up enumerated incorrectly in the guests | 16:52 |
fungi | forcing virtio worked around it | 16:52 |
Clark[m] | https://matrix.org/blog/2025/02/crossroads/ the oftc matrix bridge (and others) will shutdown if the matrix foundation doesn't find 100k in new funding by the end of March. This message brought to you by the oftc bridge | 16:52 |
fungi | yikes | 16:53 |
fungi | okay, latest daily build for noble-server-cloudimg-amd64.img uploaded to our new tenant in flex dfw3 and sjc3, i'll see if i can boot servers from them next | 16:57 |
fungi | huh, flavors in dfw3 don't match those in sjc3 | 17:20 |
fungi | the mirror in our old tenant in sjc3 used gp.0.4.8 (8gb ram, 80gb rootfs, 64gb ephemeral, 4vcpu), but in dfw3 that seems to be named gp.5.4.8 instead | 17:23 |
corvus | wonder if that's a generation number in the first of the triplet | 17:23 |
corvus | or maybe region number or something | 17:23 |
corvus | btw i think you're going to like how we handle that with niz | 17:24 |
fungi | like in a sarcastic way, or in a that's very elegant sort of way? ;) | 17:29 |
fungi | mmm, i'm starting to think that maybe rackspace nerfed the --network=PUBLICNET option since nova keeps complaining "No valid service subnet for the given device owner" | 17:32 |
fungi | so maybe they're going to force new instances to use floating-ip anyway | 17:33 |
clarkb | is the name still PUBLICNET? | 17:33 |
clarkb | I guess no valid service subnet implies it found the network but doesn't have any ips it can give out to us | 17:34 |
clarkb | I think this can happen if the entire subnet range is assigned to floating ip pools | 17:34 |
clarkb | but its been a long time since I dabbled with neutron networking | 17:34 |
corvus | elegant | 17:36 |
fungi | clarkb: yeah, the full error message mentions the uuid for the PUBLICNET network | 17:37 |
fungi | and i agree, it's likely they just ran out of ipv4 addresses for it in both regions and need to route some more | 17:43 |
corvus | i'm looking at the larger nodes i'm trying to dogfood for zuul with niz, and i was hoping to name the flavors by ram size (like niz-ubuntu-noble-8GB) but there can be significant performance differences between flavors with the same ram size across clouds. | 17:50 |
corvus | so i wonder if we should try to do something more abstract and have like "normal" "small" "large" flavors where we try to go for some sort of approximation of ram+compute equivalency? | 17:51 |
corvus | so maybe a "large" node has at least 16g of ram, but possibly more if we need more compute to get it up to par? | 17:52 |
clarkb | the risk with that is if you actually need more ram than that and only pass on half the clouds. I think this is less of a concern for zuul but it would definitely become an issue for openstack almost immediately | 17:53 |
corvus | (maybe in the future we should just get rid of flavors and go with resource specs for jobs or something, but it's too late to fold that into the niz work; that would have to be a future thing) | 17:53 |
clarkb | unforanately we can weather variance in time (so fewer cpus == slower or different cpus on different clouds are faster etc) but its tough to overcome the oomkiller without major concessions | 17:54 |
corvus | true | 17:54 |
clarkb | granted openstack has been happy to set swap to 8GB and be extra slow rather than determine where the memory leaks/cost are and attempt to better optimize them | 17:54 |
corvus | we could do both with a bit more verbose config... we could have "niz-ubuntu-noble-8GB" and "niz-ubuntu-noble-normal". not sure i love that idea. | 17:55 |
clarkb | its an interesting problem and one that I'm guessing most zuul uses don't really have as they likely put all their resources in one cloud bucket | 17:58 |
clarkb | the extra variance is a side effect of our multicloud setup. I know at one point there was an attempt by some to try and get openstack clouds to standardize on flavors (then optionally have extra flavors) which may have made things simpler for us | 17:58 |
corvus | yeah, or maybe 2 baskets with fairly comparable flavors. | 17:58 |
corvus | i'm going to gather a bit more data, then see about proposing some changes later | 17:59 |
corvus | fungi: i saw a bunch of flex auth stuff earlier, and i see 401 errors in zuul launcher; do we need to do something for it? | 18:02 |
corvus | 2025-02-20 17:40:07,110 ERROR zuul.Launcher: keystoneauth1.exceptions.http.Unauthorized: The request you have made requires authentication. (HTTP 401) | 18:03 |
corvus | that's for flex sjc3 | 18:03 |
*** mtomaska__ is now known as mtomaska | 18:34 | |
clarkb | corvus: there are two things going on. The first was a bugfix (to use project id or name or something I think) and the other is we have two projects in sjc3 and one in dfw3. We basically bootstrapped with an old projcet in sjc3 that we need to get off of to be in line with dfw3 on the new project so there is a new set of profiles on bridge for sjc3 and dfw3 that map to one another. | 19:15 |
fungi | corvus: first i'v heard, maybe they're having an outage now though | 19:15 |
clarkb | fungi: it could be related to the thing you fixed with the new project ids I think | 19:15 |
clarkb | if zuul launcher isn't using the same secrets as nodepool? | 19:15 |
clarkb | since you updated those in nodepool right? | 19:15 |
fungi | lemme check but i shouldn't have touched those | 19:16 |
clarkb | fungi: before we we started working on making dfw3 there was the auth issue that you fixed with cloudnull's hlep | 19:16 |
clarkb | thats the issue I'm suspecting is still hitting zuul launcher | 19:16 |
fungi | oh, where do we set that? what's the var name for the project name? | 19:16 |
fungi | or for the project_id more likely | 19:16 |
clarkb | sorry there are tw o different things that happened that both involve projects that have happendd recently which makes it confusing to talk about | 19:16 |
fungi | i'll check git on bridge to see if there were references i missed when updating | 19:17 |
clarkb | 22:45:56 fungi | looks like some local project ids changed | 19:18 |
fungi | i don't see a reference to the old project_id in our private hostvars | 19:18 |
fungi | so everything that was there has been updated | 19:18 |
clarkb | maybne the launcher needs to restart to pick them up? | 19:18 |
clarkb | you restarted the nodepool launcher iirc | 19:18 |
clarkb | that may be all that is needed | 19:18 |
fungi | yeah, that's likely it | 19:18 |
fungi | i had to down/up the nodepool-launcher container to notice the deployed change to clouds.yaml | 19:19 |
clarkb | corvus: any objection to me landing https://review.opendev.org/c/opendev/zone-opendev.org/+/942377 to point tracing.opendev.org at tracing02.opendev.org and see if zuul pciks that up? I half wonder if we need to restart schedulers to see that too (but doing the update today/tomorrow should make that automatic ish over the weekend) | 19:20 |
fungi | corvus: any objection to me downing and upping the zuul-launcher container on zl01 to pick up the clouds.yaml update from yesterday? | 19:26 |
fungi | er, from day before yesterday i mean | 19:27 |
clarkb | I wonder if we need to restart the nodepool builders too | 19:28 |
fungi | potentially | 19:35 |
corvus | fungi: clarkb re picking up clouds.yaml, that makes sense; i'll go ahead and restart the zuul-launcher since i have a window open | 20:03 |
corvus | clarkb: no objection re tracing and sounds like a good experiment to see if updates are needed | 20:03 |
corvus | it might not need restarts since i think everything is just http queries, so will probably just use the libc resolver behavior | 20:04 |
corvus | +2 but did not +w | 20:06 |
clarkb | I have approved it | 20:22 |
opendevreview | Merged opendev/zone-opendev.org master: Switch tracing.o.o CNAME to tracing02 https://review.opendev.org/c/opendev/zone-opendev.org/+/942377 | 20:24 |
clarkb | dns has updated for me. I don't see any data in https://tracing.opendev.org/search yet | 20:36 |
clarkb | corvus: the scheduler on zuul02 stopped with this message 2025-02-17 21:05:15,247 DEBUG zuul.Scheduler: Stopping tracing | 20:38 |
clarkb | that was a few days ago so unrelated to this dns update | 20:38 |
clarkb | I probably shouldn't be surprised but the tracing server has very difficult logs to read as a human... There are errors in the jaeger log they seem to preceed updating DNS but maybe get more frequent after. The errors appear to be with setting up tls connections. The ansible should've deployed with certs that allow for the connectivity but maybe that is the next thing to check | 20:48 |
clarkb | and I guess checking if tracing01 is still getting new traces | 20:49 |
clarkb | the certs and keys files are in place so that bit at least happened | 20:49 |
clarkb | since it has been a problem with other noble nodes I did chcek /var/log/kern.log and there are no recent audit messages | 20:50 |
clarkb | tracing01 appears to have similar errors so that may just be noise | 20:51 |
clarkb | I am still seeing traces go to tracing01 | 20:53 |
clarkb | per https://tracing01.opendev.org/search?end=1740084807154000&limit=20&lookback=15m&maxDuration&minDuration&service=zuul&start=1740083907154000 so maybe the issue is just that we're not talking to the new server yet? Restarting the scheduler on zuul02 should confirm that is the case but I don't want to restart the scheduler there without knowing why it stopped | 20:54 |
clarkb | and I'ev confirmed there are established tcp connections from zuul01 to tracing01 so ya probably just not connected yet | 20:55 |
clarkb | nothing seems to be actually broken. Today is a not rainy day so I may try to pop out in a bit for a bike ride. I think the above situation is fine if we just want to leave it until I get back to debug further. I suspect that will start with restarting the scheduler on 02 to see if it connects to the new server. Then we can probably let weekly restarts connect everything else over | 20:56 |
clarkb | the weekend | 20:56 |
clarkb | the other thing to dig into this afternoon is if ianw had a specific reason for preferring the executor load of inventory/base/hosts.yaml over snychronizing the git repo | 20:58 |
clarkb | ianw: I know corvus in particular was curious if there was a specific reason for that. Would be good if you can fill us in on the motivation there | 20:58 |
clarkb | is the zuul02 scheduelr shutdown potentially related to the api issues we had with vexxhost? | 21:01 |
clarkb | maybe that wedged the restart in a way that didn't cause it to come back up as expected? | 21:01 |
corvus | clarkb: looking | 21:04 |
corvus | clarkb: we should look for an external cause for the shutdown; "stopping tracing" is just one of the last things it does; the shutdown started at 2025-02-17 21:05:11,271 DEBUG zuul.Scheduler: Stopping scheduler | 21:06 |
corvus | that was a few days ago. perhaps i botched the manual restart? or maybe my manual restart interacted poorly with the playbook | 21:08 |
clarkb | 2025-02-17 20:06:19,646 INFO zuul.WebServer: Zuul Web Server stopped | 21:09 |
clarkb | the web server stopped about an hour prior to the scheduler stopping | 21:09 |
clarkb | but web and fingergw restarted | 21:09 |
corvus | i think it's highly likely that i did not re-up the scheduler, sorry | 21:10 |
clarkb | is it possible the scheduler container hadn't gracefully existing so a followup `up -d` command didn't start it then an hour later it exited and got "orphaned" | 21:10 |
clarkb | that too would explain it | 21:11 |
clarkb | should we start it now? Any concern with running a newer version since the 17th? | 21:11 |
corvus | nah, i think we can up it now. i will do so. | 21:11 |
clarkb | ko | 21:11 |
clarkb | er ok | 21:11 |
corvus | we're one launche-only commit ahead of the rest of the cluster | 21:12 |
clarkb | no tracing data from zuul02 now that it has started but it doesn't seem to be very active yet based on the debug log | 21:13 |
clarkb | probably have to wait for updating system config to complete? | 21:13 |
clarkb | I see an established tcp connection from zuul02 to tracing02 now | 21:16 |
clarkb | https://tracing.opendev.org/search?end=1740086194401000&limit=20&lookback=1h&maxDuration&minDuration&service=zuul&start=1740082594401000 and we have data | 21:16 |
clarkb | I think its fine for the actual cutover to occur as part of the weekly restart after whcih we can cleanup tracing01 after confirming we're no longer receiving new data | 21:17 |
clarkb | there are warnings about invalid parents spans I think because the data is going to two places for now. Again probably ok for a couple days | 21:17 |
clarkb | and the zuul componets list looks better | 21:17 |
clarkb | corvus: thank you for double checking this wasn't a bigger fire with zuul02 | 21:18 |
ianw | hey, sorry, running late today | 23:27 |
ianw | is it just me or is review not being very responsive? | 23:33 |
tonyb | ianw: Not just you. It's slow here too | 23:34 |
tonyb | and not just the web ui, ssh is slow too | 23:35 |
ianw | sigh, which ai bot is it this time | 23:36 |
tonyb | Looking .... | 23:36 |
tonyb | The load average isn't terrible 1.93 | 23:37 |
tonyb | Looks like meta-externalagent/1.1 and and possibly GoogleBot/2.1 | 23:41 |
Clark[m] | `gerrit show-queue -w` might offer insight too | 23:41 |
Clark[m] | Historically it hasn't been the many request AI bots but researchers crawling through changes | 23:42 |
tonyb | Looking at show-queue but it's sooooo sloooooow | 23:44 |
clarkb | hrm it responded quickly for me | 23:51 |
clarkb | is it possible this is a series of tubes problem and not a server side issue? | 23:51 |
tonyb | I guess so | 23:51 |
clarkb | the web ui loads pretty normal for me too | 23:51 |
tonyb | the load is 3.14 | 23:51 |
tonyb | but I don't think that's silly high | 23:52 |
clarkb | no thats reasonable for the server | 23:52 |
tonyb | also it's much better now even for me | 23:52 |
clarkb | ianw: any change for you? | 23:52 |
ianw | my ssh session is still slllloooowww | 23:54 |
tonyb | clarkb: time ssh ... gerrit show-queue -w is approx 45seconds for me | 23:54 |
clarkb | real0m1.474s for me | 23:55 |
tonyb | Okay so possibly the wet string across the pacific | 23:55 |
clarkb | thats my hunch right now given the disparity in our experiences | 23:56 |
ianw | --- review.opendev.org ping statistics --- | 23:56 |
ianw | 30 packets transmitted, 24 received, 20% packet loss, time 33162ms | 23:56 |
ianw | #justaustralianthings | 23:56 |
clarkb | ianw: tonyb out of curiousity I wonder if forcing ipv4 or ipv6 (depending on which you'd use by default) woudl help | 23:56 |
clarkb | sometimes the routing is different enough that you get a different experience | 23:57 |
ianw | --- review.opendev.org ping statistics --- | 23:57 |
ianw | 13 packets transmitted, 0 received, 100% packet loss, time 12306ms | 23:57 |
ianw | that's ping -4 ... | 23:57 |
tonyb | --- review02.opendev.org ping statistics --- | 23:57 |
tonyb | 30 packets transmitted, 14 received, 53.3333% packet loss, time 29425ms | 23:57 |
tonyb | as is that ^^ | 23:57 |
clarkb | ianw: oof | 23:57 |
ianw | oh eventually it came alive, but similar packet loss | 23:58 |
tonyb | --- review.opendev.org ping statistics --- | 23:58 |
tonyb | 30 packets transmitted, 30 received, 0% packet loss, time 29039ms | 23:58 |
tonyb | rtt min/avg/max/mdev = 22.775/23.076/25.875/0.534 ms | 23:58 |
tonyb | that from a US host I could quickly access | 23:59 |
clarkb | https://mirror.ca-ymq-1.vexxhost.opendev.org is in the same cloud region I wonder if you get a similar experience there? Could compare with the other mirrors in north america to see if it is a north america problem or something more specifc | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!