ianw | infra-root: pypa/pip have enabled the opendev app with https://github.com/pypa/pip/issues/9103 | 00:01 |
---|---|---|
ianw | infra-root: i've proposed https://review.opendev.org/#/c/761467/2 and https://review.opendev.org/#/c/761468/2 to setup the same project-config/tenant setup as for pyca | 00:01 |
ianw | I'll leave it for review on those to essentially agree that we're happy to have our resources put towards this; for mine I think similar to pyca it's going to help everyone | 00:01 |
clarkb | what sort of job do we expect them to be running? openstack constraints integration testing type deals? | 00:01 |
ianw | my understanding would be tox type testing on, particularly on debian/centos and perhaps fedora | 00:06 |
*** tosky has quit IRC | 00:14 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] borg: add fuse https://review.opendev.org/761275 | 00:15 |
clarkb | ianw: couple of comments on ^ | 00:19 |
ianw | clarkb: not sure what you mean if it depends on borgbackup? i think adding it to extras includes it? i'll have a poke at the unmount, you're probably right but it does seem to have a command | 00:21 |
clarkb | I mean does pip install borgbackup[fuse] imply installing borgbackup? I don't actually know | 00:21 |
clarkb | and for the unmount I think you're just doing `borg /opt/backups` rather than something like borg unmount /opt/backups | 00:22 |
fungi | usually [extras] will install everything which gets installed without that, plus whatever is in the extra-requires | 00:22 |
ianw | is grafana.opendev.org not responding for others? | 00:28 |
clarkb | I can get it via ssh but not https | 00:29 |
clarkb | looks like an iptables problem | 00:30 |
clarkb | it doesn't have port 80 and 443 open in iptables | 00:30 |
fungi | how would that have happened? | 00:31 |
clarkb | did the groups change for it? we use webserver group for 80 and 443 in many cases | 00:31 |
ianw | hrm, weird | 00:31 |
fungi | last ssh login (before now) was nearly a month ago, so i doubt we did anything manually directly on that server | 00:33 |
ianw | grafana[0-9].opendev.org is in the webservers group | 00:33 |
ianw | that wants a * | 00:35 |
ianw | hrm | 00:35 |
fungi | d'oh! | 00:36 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add * match to grafana.opendev.org https://review.opendev.org/761476 | 00:36 |
fungi | that changed with the cleanup of the old grafana server i guess | 00:36 |
ianw | i need to shut that down | 00:38 |
ianw | i'll get the opendev working then do that today | 00:38 |
ianw | i think i didn't notice because my url bar has auto-filled in the old openstack.org server | 00:38 |
fungi | ahh, okay, i didn't realize that was still in progress | 00:39 |
*** DSpider has quit IRC | 00:40 | |
ianw | neither did I :) | 00:42 |
ianw | clarkb: This is a convenience wrapper that just calls the platform-specific shell command - usually this is either umount or fusermount -u. | 00:48 |
ianw | so yeah, can just unmount | 00:49 |
ianw | testinfra works though; it runs a test backup to the test backup server, then can mount it via fuse. pretty cool! | 00:49 |
clarkb | nice | 00:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: add fuse https://review.opendev.org/761275 | 00:57 |
*** whoami-rajat___ has joined #opendev | 01:06 | |
openstackgerrit | Merged opendev/elastic-recheck master: Add query for bug 1901739 https://review.opendev.org/759967 | 01:07 |
openstack | bug 1901739 in OpenStack Compute (nova) " libvirt.libvirtError: internal error: missing block job data for disk 'vda'" [High,Confirmed] https://launchpad.net/bugs/1901739 | 01:07 |
openstackgerrit | melanie witt proposed opendev/elastic-recheck master: Add query for bug 1902002 https://review.opendev.org/761478 | 01:16 |
openstack | bug 1902002 in devstack "Fail to get default route device in CI jobs" [Medium,In progress] https://launchpad.net/bugs/1902002 - Assigned to Dr. Jens Harbott (j-harbott) | 01:16 |
openstackgerrit | Merged opendev/system-config master: Add * match to grafana.opendev.org https://review.opendev.org/761476 | 01:16 |
openstackgerrit | melanie witt proposed opendev/elastic-recheck master: Add query for bug 1902002 https://review.opendev.org/761478 | 01:18 |
openstack | bug 1902002 in devstack "Fail to get default route device in CI jobs" [Medium,In progress] https://launchpad.net/bugs/1902002 - Assigned to Dr. Jens Harbott (j-harbott) | 01:18 |
ianw | grafana opendev back, i'll clean up the old now | 02:18 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: grafana: redirect http to CNAME https://review.opendev.org/761487 | 02:28 |
ianw | i think the new graphite server is good too. i'll cleanup the old one | 02:38 |
ianw | #status log remove old graphite01.opendev.org server and storage | 02:41 |
openstackstatus | ianw: finished logging | 02:42 |
ianw | #status log removed grafana02.openstack.org, CNAME now goes to grafana.opendev.org | 02:42 |
openstackstatus | ianw: finished logging | 02:42 |
openstackgerrit | Merged opendev/system-config master: borg-backup: add fuse https://review.opendev.org/761275 | 02:45 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: grafana: fix typo in test name https://review.opendev.org/761489 | 02:57 |
*** hamalq has quit IRC | 03:21 | |
*** ykarel has joined #opendev | 03:49 | |
melwitt | does anyone know if this kind of 503 a common/known thing? ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='opendev.org', port=443): Max retries exceeded with url: /openstack/requirements/raw/branch/master/upper-constraints.txt (Caused by ResponseError('too many 503 error responses',)) | 03:50 |
mnaser | melwitt: interesting you bring that up, i am getting a few failures in our downstream with `error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.` | 04:59 |
melwitt | O.o | 05:01 |
melwitt | the gate is killing me rn I swear :( | 05:01 |
mnaser | :( | 05:03 |
ianw | melwitt: that ... shouldn't happen. one of our backends might be unhappy | 05:04 |
ianw | melwitt: which job? | 05:04 |
melwitt | that was nova-live-migration https://zuul.opendev.org/t/openstack/build/5a4e126cd734457fa1024575ec193440/log/logs/devstacklog.txt#8474 | 05:04 |
ianw | mnaser: i agree with your point re: zuul as a 3rd party CI. but ... I think we need to reach out and try to bring people along for the journey | 05:05 |
melwitt | I'll be back in a couple of hours to recheck for the 9th time o/ | 05:05 |
ianw | let me see if i can find where the lb sent that | 05:06 |
mnaser | ianw: i only voice this because i kinda tried the experiment with cherrypy and it really lead to nothing but just a normal reporting job amongst many others | 05:06 |
mnaser | if there's no incentive to move towards gating, eh. | 05:06 |
ianw | i do see your point and somewhat, agree and pyca has been similar. we could do their wheel releases for them, which they get manually involved with but there's been resitance | 05:08 |
ianw | however, i feel like having skin in the game, when things come up; when you can point out that zuul would have stopped that breaking change, etc. gives a chance for adoption | 05:09 |
ianw | i think that was 23.253.203.147 | 05:09 |
ianw | i think that went to gitea05 balance_git_https/gitea05.opendev.org | 05:11 |
ianw | sorty, it actually went to balance_git_https/gitea06.opendev.org | 05:18 |
ianw | that host does actually look unhappy | 05:20 |
ianw | 2020/11/05 03:21:43 cmd/web.go:107:runWeb() [I] Starting Gitea on PID: 1 | 05:22 |
ianw | 2020-11-05 03:21:32.393 | ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='opendev.org', port=443): Max retries exceeded with url: /openstack/requirements/raw/branch/master/upper-constraints.txt (Caused by ResponseError('too many 503 error responses',)) | 05:24 |
ianw | it seems this managed to happen right as the container was restarting | 05:24 |
*** fressi has joined #opendev | 05:29 | |
*** whoami-rajat___ is now known as whoami-rajat__ | 05:32 | |
ianw | clarkb: just mounted fuse backups on ethercalc02, all seems to work. i think everything is ready to roll otu to more servers now | 05:39 |
mnaser | ianw: just had a downstream job fail on 'error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.' | 06:27 |
*** sboyron has joined #opendev | 06:27 | |
*** ysandeep|away is now known as ysandeep|ruck | 06:36 | |
*** Tengu has quit IRC | 06:58 | |
*** Tengu has joined #opendev | 07:05 | |
*** mschoenlaub has joined #opendev | 07:05 | |
*** mschoenlaub has quit IRC | 07:06 | |
*** marios has joined #opendev | 07:09 | |
*** lpetrut has joined #opendev | 07:11 | |
*** melwitt has quit IRC | 07:20 | |
*** melwitt has joined #opendev | 07:21 | |
*** ykarel_ has joined #opendev | 07:21 | |
*** eolivare has joined #opendev | 07:22 | |
*** ykarel has quit IRC | 07:24 | |
frickler | mnaser: RPC sounds like some internal call, not a download. do you have the logs accessible somewhere? | 07:31 |
mnaser | frickler: that was during a git clone -- i don't have the log in an easily locatable way but ill try to keep more for the next time | 07:32 |
frickler | iiuc we did unblock that crawling job, maybe it isn't limiting itself enough yet | 07:34 |
mnaser | frickler: that was git clones to https://opendev.org though | 07:35 |
*** ykarel_ is now known as ykarel | 07:39 | |
frickler | ah, right, the crawler went against gerrit. I do see some spikes on the gitea-lb cacti graphs since 0400, not sure if those might be related or whether they are normal and just smoothed out in the longer intervals | 07:43 |
frickler | seems selection of custom intervals in the cacti graphs doesn't work for me, I always get only the default view | 07:44 |
mnaser | frickler: caught it -- http://paste.openstack.org/show/799721/ | 07:46 |
mnaser | i think its almost always horizon at the root | 07:46 |
*** slaweq has joined #opendev | 07:54 | |
*** ralonsoh has joined #opendev | 07:59 | |
frickler | I didn't find anything obvious in the logs. I also tried cloning horizon from every gitea instance, found no issues there, either | 08:07 |
*** tosky has joined #opendev | 08:13 | |
*** andrewbonney has joined #opendev | 08:14 | |
zbr | fungi: ianw: ansible-lint does require ansible >=2.9 and the can't upgrade ansible in place is still valid without any fixing being planned. | 08:18 |
zbr | ansible team is seeing it as a pip/setuptools bug, and they other side has other priorities. so we need to be careful to avoid it. | 08:19 |
ianw | zbr: ansible-lint only specifies ansible>=2.8 in it's setup.cfg | 08:19 |
zbr | well, that is easy to fix. | 08:20 |
ianw | i'm not sure how that fits with https://review.opendev.org/#/c/761473/ | 08:21 |
zbr | i wanted to make the linter ansible version agnostic but is not possible now, it may take an year or more and help from ansible core to implement some missing features. | 08:21 |
ianw | it doesn't seem right to lint the jobs with an ansible that zuul isn't running, although i'm not sure how much that actually matters | 08:22 |
*** ysandeep|ruck is now known as ysandeep|lunch | 08:22 | |
zbr | hmm... now i check and I see that we still have the 2.8 pipelines in linter so it should really support 2.8 | 08:25 |
zbr | if it failed is likely due to a messed ansible install (due to upgrade/downgrade) | 08:25 |
ianw | there's no upgrade or downgrade happening in the tox jobs; it was pinning the version to 2.7 which causes the failure | 08:26 |
zbr | yep. that is because old pip does not check for conflicts, the new resolver would have prevented it. | 08:27 |
zbr | add a "pip check" as first command, to prevent running code with broken deps. | 08:27 |
zbr | upgrade/downgrade can still happen inside tox jobs based on the order the deps are defined, but that was not the issue in this case. | 08:28 |
zbr | I could try to add an extra check for version in linter but I am not sure it does worth the effort. | 08:29 |
ianw | i'm not actually sure what the failure mode would be leaving ansible uncapped. perhaps a later version would correctly parse something that would not actually parse in the earlier version zuul is using? | 08:31 |
openstackgerrit | zbr proposed zuul/zuul-jobs master: More E208 https://review.opendev.org/761293 | 08:33 |
zbr | linting should not be confused with functional testing, linting is more about testing practices and detecting upcoming changes that may break your code, so that is why is better to use the upper bounds instead of lower ones. For functional is different. | 08:36 |
*** sshnaidm|rover has quit IRC | 08:36 | |
zbr | i have a good example from flake8 where it required to be run on newer version of python in order to detect a big range of issues, even if the linted code did support a lower version of python. | 08:36 |
zbr | to test compatibility, we would have to run functional testing with both lower and upper bounds, but that brings huge extra costs. | 08:37 |
*** sshnaidm|rover has joined #opendev | 08:38 | |
zbr | i personally finding the version mix as providing a decent coverage of both. | 08:38 |
*** rpittau|afk is now known as rpittau | 08:39 | |
*** sshnaidm|rover has quit IRC | 08:43 | |
*** sshnaidm|rover has joined #opendev | 08:45 | |
*** sshnaidm|rover has quit IRC | 08:52 | |
*** sshnaidm|rover has joined #opendev | 08:56 | |
*** sshnaidm|rover has quit IRC | 09:00 | |
*** jaicaa has quit IRC | 09:01 | |
*** jaicaa has joined #opendev | 09:02 | |
kevinz | frickler: ianw: Following the talk about https://review.opendev.org/#/c/760790/. We'd like to introduce OpenEuler 20.09 to Devstack, which is a Rpm based operation system and now work for AArch64 and X86_64 | 09:17 |
kevinz | If the DIB is essential, I will ask OpenEuler team to offer the some support in upstreaming this features. | 09:18 |
kevinz | But if uploading images is fine temporily, I think adding a jobs to test this Devstack support woule be a good plus, so that we can work parallelly to make that work quickly happen | 09:20 |
frickler | infra-root: ^^ there seems to be a generic cloud image available, not sure whether it would be o.k. for us to start with that or whether we'd have to insist on having dib support in order to get our customizations in place from the start | 09:20 |
*** ysandeep|lunch is now known as ysandeep|ruck | 09:20 | |
frickler | I'm also not sure whether we'd have a procedure in place to use upstream images in nodepool at all, or whether that would have to be done manually | 09:21 |
*** Green_Bird has joined #opendev | 09:33 | |
*** sshnaidm has joined #opendev | 09:49 | |
*** DSpider has joined #opendev | 10:00 | |
*** fressi has quit IRC | 10:10 | |
*** hashar has joined #opendev | 10:15 | |
*** noonedeadpunk has quit IRC | 10:32 | |
*** noonedeadpunk has joined #opendev | 10:32 | |
*** ysandeep|ruck is now known as ysandeep|brb | 10:35 | |
kevinz | frickler: Thanks! will wait for more comments here :-D | 10:48 |
*** ysandeep|brb is now known as ysandeep|ruck | 10:51 | |
*** fressi has joined #opendev | 10:57 | |
*** fressi has quit IRC | 11:13 | |
*** noonedeadpunk has quit IRC | 11:21 | |
*** noonedeadpunk has joined #opendev | 11:25 | |
*** sboyron has quit IRC | 11:49 | |
*** sboyron has joined #opendev | 11:52 | |
*** marios has quit IRC | 12:17 | |
*** marios has joined #opendev | 12:21 | |
*** marios has quit IRC | 13:00 | |
*** marios has joined #opendev | 13:03 | |
*** dmellado has quit IRC | 14:16 | |
*** dmellado has joined #opendev | 14:20 | |
openstackgerrit | Merged openstack/project-config master: tox.ini : update Ansible pin https://review.opendev.org/761473 | 14:34 |
*** dtantsur has joined #opendev | 15:00 | |
dtantsur | hi folks! sorry if it has been asked too often already, but would it possible to enable code search on opendev git? | 15:06 |
mordred | dtantsur: it exists? https://opendev.org/explore/code?tab=&q=novaclient https://opendev.org/sardonic/sardonic/search?q=cmdb | 15:08 |
mordred | there is an open issue upstream gitea for making that all pluggable so that something like elasticsearch could be used to power the indexing ... so at the moment I think codesearch.openstack.org is still better at searching | 15:11 |
dtantsur | mordred: this is empty for me: https://opendev.org/openstack/ironic/search?q=automated_clean | 15:12 |
dtantsur | how does it look for you? | 15:12 |
mordred | similar. I'm guessing automated_clean is something that is in the ironic repo? | 15:12 |
dtantsur | yep. I've tried many things including "if" :) | 15:13 |
mordred | fascinating | 15:13 |
mordred | well - it's not a thing that's intentionally turned off | 15:13 |
dtantsur | I've never had any results whenever I tried, so I assumed it might have been turned off | 15:13 |
dtantsur | fascinating indeed | 15:13 |
mordred | but it's also not a subsystem that's gotten a lot of love - largely because it's currently a per-gitea-node thing | 15:13 |
*** hashar is now known as hasharKids | 15:20 | |
clarkb | the current code search uses a go lib that seems to have odd behaviors that don't map well to how humans search for text | 16:01 |
clarkb | the elasticsearch support comes in the next release and should be more familiar to those who have used our logstash | 16:01 |
clarkb | I expect we can try deploying ES alongside gitea in non clustered mode just to ensure that all works well | 16:01 |
clarkb | frickler: kevinz: we strongly prefer dib because what we've found happens with the upstream images is they change behaviors or do things with cloud init that don't make sense. Its just easier to have a single common image that uses glean for our test nodes | 16:02 |
*** ysandeep|ruck is now known as ysandeep|away | 16:06 | |
*** dmellado has quit IRC | 16:11 | |
*** dmellado has joined #opendev | 16:13 | |
openstackgerrit | Merged openstack/project-config master: Add manila client,ui,tempest plugin core teams https://review.opendev.org/758868 | 16:21 |
*** marios is now known as marios|out | 17:01 | |
*** ykarel has quit IRC | 17:04 | |
*** ykarel has joined #opendev | 17:05 | |
openstackgerrit | Merged openstack/project-config master: Update neutron grafana dashboard https://review.opendev.org/758208 | 17:06 |
*** marios|out has quit IRC | 17:14 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Update gerrit plugins on 2.16 and 3.0 https://review.opendev.org/761641 | 17:16 |
clarkb | ensuring we're keeping our gerrit images up to date after hashar's feedback | 17:16 |
openstackgerrit | Merged opendev/system-config master: Document dual account split for Gerrit admins https://review.opendev.org/760051 | 17:19 |
*** rpittau is now known as rpittau|afk | 17:21 | |
*** Green_Bird has quit IRC | 17:21 | |
*** Green_Bird has joined #opendev | 17:25 | |
hasharKids | clarkb: hi, I hope my reply was not perceived as me patronizing! | 17:30 |
*** hasharKids is now known as hashar | 17:30 | |
fungi | hashar: not at all! it had a lot of good reminders | 17:30 |
clarkb | hashar: nope, it was useful to get input on whether or not we are on track | 17:30 |
clarkb | and the bits about the js stuff were helpful too | 17:30 |
hashar | but generally Gerrit upstream recommend to use the very latest patch release of any minor series | 17:30 |
fungi | yeah, reassuring to see it basically matches our upgrade plan | 17:30 |
clarkb | hashar: yup, our docker builds build off of stable-* branches and get the latest commit | 17:30 |
hashar | so 2.x.max(y) | 17:30 |
clarkb | so we should be at least as new as the most recent release for each stable branch when we rebuild | 17:31 |
hashar | also note I haven't been directly involved in the Gerrit upgrade planning. Christian Aistleitner has done all the hardwork | 17:32 |
hashar | so the reference is his writing at https://groups.google.com/g/repo-discuss/c/G5wucKJg9Ag/m/pLin-i3mBgAJ :] | 17:32 |
hashar | I merely echoed and mentioned a few things we found after we upgraded | 17:33 |
fungi | 2.x.max(y) or newer, yeah. in many cases there are subsequent stable branch commits which are not yet tagged as point releases | 17:38 |
*** hamalq has joined #opendev | 17:38 | |
clarkb | yup, we were already testing with the notedb migration improvements prior to the latest 2.16 release as a result | 17:39 |
*** eolivare has quit IRC | 17:45 | |
hashar | ahh great ! | 17:51 |
hashar | do you, or will you, run the production Gerrit out of a Docker image? | 17:51 |
clarkb | hashar: we do and we will :) | 17:52 |
fungi | yeah, we build a docker image with zuul jobs, so that our chosen set of plugins will be included | 17:52 |
fungi | we also continuously deploy image updates with a zuul job too | 17:53 |
hashar | nice. You are way more automatized than us :-D | 17:53 |
clarkb | we don't auto restart though | 17:53 |
hashar | for later, you might be interested in the multi-site plugin https://gerrit.googlesource.com/plugins/multi-site | 17:53 |
fungi | right, we'd rather still control the outage times for restarts | 17:53 |
hashar | as I get it, that lets ones do rolling upgrades with 0 downtime | 17:53 |
fungi | but yeah, if multi-site is robust enough now, maybe rolling restarts of cluster members behind an lb would suffice | 17:54 |
frickler | that might even allow us to distribute over multiple providers. not sure how we'd lb ssh though? | 17:55 |
hashar | there are some explanation by Luca Milanesio (a Gerrit maintainer and he is behind https://www.gerritforge.com/ ) at https://www.mediawiki.org/wiki/Topic:Vwkvtt6hlurmo42t | 17:55 |
clarkb | frickler: aiui its all primary secondary | 17:55 |
clarkb | frickler: not active active | 17:55 |
fungi | right, we'd configure haproxy or whatever to only ever send connections to one cluster member or the other, never both | 17:56 |
fungi | but i agree, it will likely be disruptive for ssh stream-events connections. they'll get reset and need to reconnect | 17:56 |
*** ykarel is now known as ykarel|away | 17:56 | |
fungi | which could leave windows of time where events are missed | 17:56 |
clarkb | one thing at a time :) | 17:57 |
fungi | yeah, i'm not in any hurry to add multi-site but it's neat to consider for down the road | 17:58 |
fungi | it would also be lots of additional complexity optimizing away one-minute restart outages which happen at most once a month | 17:58 |
fungi | so we should definitely weigh the positives and negatives of such a solution | 17:59 |
hashar | another advantage is to reduce latency which comes helpful when your users are geographically distributed all accross the world | 17:59 |
fungi | how does it reduce latency if only one cluster member is active? | 18:00 |
hashar | I mean, you could have a Gerrit in asia for example | 18:00 |
fungi | or is there an active/active model with multi-site too, not just active/standby? | 18:00 |
clarkb | it was my understanding that the gerrit clustering doens't do active active | 18:00 |
hashar | but maybe some locks have to happen all the way back to a reference that is held in the US, so maybe that doesn't help much | 18:00 |
clarkb | you sync from the primary to the standby's using the replication plugin | 18:01 |
clarkb | and you can't sanly do that in both directions I don't think | 18:01 |
fungi | in theory clients could read from the standby node, but not write to it | 18:01 |
clarkb | (but maybe that has changed since I lasts looked at this) | 18:01 |
hashar | the link I pasted above was us complaining about multi site not really working for us ( https://www.mediawiki.org/wiki/Topic:Vwkvtt6hlurmo42t ) , but one of its maintainer pointed out the doc we used was outdated | 18:02 |
hashar | seems like the plugin has been largely improved and the doc has been updated as a result of the above discussion | 18:02 |
hashar | https://gerrit.googlesource.com/plugins/multi-site/+/HEAD/DESIGN.md might gives more clues | 18:02 |
hashar | but as Clark said, one thing at a time. You can look at it next year I guess :] | 18:03 |
*** ykarel|away has quit IRC | 18:11 | |
openstackgerrit | Merged opendev/system-config master: Update gerrit plugins on 2.16 and 3.0 https://review.opendev.org/761641 | 18:25 |
*** andrewbonney has quit IRC | 18:29 | |
*** hashar is now known as hashardinner | 18:39 | |
*** ralonsoh has quit IRC | 18:41 | |
fungi | ooh, python 3.10.0a2 just dropped! | 18:42 |
*** sshnaidm is now known as sshnaidm|afk | 18:44 | |
fungi | i've booted review-test back up and then downed the gerrit container on it | 18:52 |
*** dtantsur is now known as dtantsur|afk | 18:53 | |
*** lpetrut has quit IRC | 19:16 | |
*** _mlavalle_2 has quit IRC | 19:16 | |
*** Tengu has quit IRC | 19:31 | |
*** rchurch has quit IRC | 20:22 | |
*** hashardinner is now known as hashar | 20:38 | |
*** dwilde has quit IRC | 20:46 | |
*** d34dh0r53 has joined #opendev | 20:46 | |
fungi | any tmux users who aren't aware, be on the lookout for updates to fix code execution by carefully crafted escape sequences: https://www.openwall.com/lists/oss-security/2020/11/05/3 | 21:10 |
ianw | fungi: when you have a sec, would you mind a double check on the grafana http -> https redirect one-liner @ https://review.opendev.org/#/c/761487/ ... just making sure there isn't a better way to do it | 21:19 |
fungi | we can redirect /.* to /$1 | 21:21 |
fungi | so old http urls continue to work | 21:21 |
fungi | i think redirecting / only does any good if folks load up / explicitly? | 21:22 |
ianw | fungi: i think Redirect just replaces the string and leaves the rest of the url alone ... i mean it seems to work that way? e.g. http://grafana.opendev.org/dashboards | 21:24 |
fungi | oh, maybe | 21:25 |
fungi | could be i'm confusing it with rewrite | 21:25 |
*** hashar has quit IRC | 21:27 | |
*** sboyron has quit IRC | 21:28 | |
ianw | yeah, something about "nice thing about standards is there's so many to choose from" :) | 21:59 |
fungi | re.* | 22:01 |
fungi | lgtm then | 22:01 |
*** slaweq has quit IRC | 22:06 | |
*** mlavalle has joined #opendev | 22:08 | |
*** hamalq has quit IRC | 22:11 | |
openstackgerrit | Merged opendev/system-config master: grafana: redirect http to CNAME https://review.opendev.org/761487 | 22:31 |
ianw | does limestone not have ipv4 nat? or is glean doing something wrong? | 22:37 |
ianw | wrt https://review.opendev.org/#/c/761178/ | 22:37 |
ianw | https://d4eb7e3efe98cba79a4b-f4d168cdb20f40841821e4b213645c0f.ssl.cf2.rackcdn.com/739139/12/gate/neutron-tempest-plugin-scenario-linuxbridge/9a6b4f7/zuul-info/zuul-info.controller.txt | 22:37 |
clarkb | ianw: something is going on there. I pinged logan yesterday in -infra but havemt heard back | 22:37 |
ianw | ahh, ok | 22:37 |
clarkb | it should have a 10/8 network and gleans hould configure it to dhcp | 22:37 |
clarkb | but I havent actually poked at the opemstack apis and hosts yet | 22:38 |
openstackgerrit | Merged opendev/system-config master: grafana: fix typo in test name https://review.opendev.org/761489 | 22:38 |
openstackgerrit | Merged openstack/project-config master: Add pypa/project-config https://review.opendev.org/761467 | 22:39 |
ianw | fungi/clarkb: you seemed to have some opinions on the 8gb swap reset @ https://review.opendev.org/761119 in the linked irc conversation, so i haven't approved. it does seem that the larger swap is a matter of ~20 seconds to create which doesn't seem too bad to me | 22:43 |
clarkb | the problem is projects like ironic run out of disk with even the 1gb swap | 22:43 |
clarkb | and increasing it to 8gb will only make that bigger. If this wasn't a last ditch method to avoid jobs failing when they need to swap a little I'd be more on board but the swap isn't really there to double the "memory" | 22:44 |
clarkb | if jobs hvae those problems they need to reduce memory or be multinode and distribute the memory load | 22:44 |
clarkb | ultimately if the rest of openstack says projects like ironic are the ones that need to change then ok we cna land something like that, but I think that gets the purpose of the swap device wrong | 22:45 |
ianw | yeah, good points; we probably should communicate that though | 22:50 |
ianw | back to limestone, in "nodepool list" the nodes have a 10. ip address. so presumably openstacksdk is seeing an address defined | 22:50 |
clarkb | ya it was probably a mistake to make it so big previously, but we figured its sparse allocated so it doesn't actually matter unless you need it and if you run out of disk and need swap you'll break anyway | 22:51 |
openstackgerrit | Merged openstack/project-config master: Add pypa tenant https://review.opendev.org/761468 | 22:51 |
ianw | i just jumped on a random focal node and it has ipv4 | 22:51 |
clarkb | ianw: was it configured by dhcp (just want to confirm that assumption on my part) | 22:51 |
ianw | Nov 5 22:21:38 ubuntu-focal-limestone-regionone-0021581754 dhclient[466]: DHCPREQUEST for 10.4.70.27 on ens3 to 255.255.255.255 port 67 (xid=0xcc89a40) | 22:52 |
ianw | yep | 22:52 |
clarkb | I wonder if there is some issue with dhcp for some hosts, like maybe neutron isn't setting up the mapping in dnsmasq in some cases then it fails? | 22:53 |
clarkb | I've jumped on a bionic node and it too looks fine, has a default route via ens3 and a 10/8 address | 22:53 |
clarkb | another thing it could be is we're running out of addresses in the pool? | 22:54 |
ianw | similar on another two nodes i've jumped on | 22:54 |
clarkb | allocation_pools | 10.4.70.10-10.4.70.254 | 22:56 |
clarkb | that should be plenty for what I think is a ~50 node max-server limit | 22:56 |
ianw | unfortunately the syslog in that job doesn't go back to the start of boot | 22:56 |
clarkb | there are 62 ports in use | 22:57 |
clarkb | all that is telling me that we're well below our allocation limit so that shouldn't be the problem | 22:58 |
fungi | it's possible that with random macs and decent churn we're overrunning the pool in dhcpd if the leases are established with too long of a timeout? | 22:58 |
clarkb | oh maybe | 22:58 |
clarkb | usually neutron leases are very short, but that isn't necessarily the case | 22:58 |
clarkb | ianw: does for focal node say what the lease period is? | 22:59 |
ianw | option dhcp-lease-time 86400; | 23:00 |
clarkb | that is one day right? I wonder if that is the problem | 23:00 |
ianw | option dhcp-renewal-time 43200; | 23:01 |
ianw | option dhcp-rebinding-time 75600; | 23:01 |
ianw | dunno what those are | 23:01 |
fungi | yeah, that's a day. it really depends on the dhcpd though as to whether it will recycle leases which don't respond to ping/arp under pressure | 23:01 |
clarkb | ianw: the renewal time is when the client should renew usually set to 1/2 the lease time | 23:01 |
clarkb | I'm trying to see where neutron might expose this and if we can see it as non cloud admins | 23:02 |
fungi | however, if the api is claiming to have assigned an ip address for the question nodes, then i don't expect it to be a pool problem | 23:02 |
clarkb | looks like its a config option in the dhcp agent config | 23:02 |
fungi | i want to say neutron sets up reservations in dnsmasq? | 23:02 |
clarkb | not something exposed by the api? | 23:03 |
ianw | not sure if the journal file will have the syslog | 23:03 |
clarkb | fungi: yes it uses mac address maps that dnsmasq assigns | 23:03 |
fungi | if it's really all explicit reservations then lease times are irrelevant | 23:03 |
fungi | since it's not doing an actual dhcp "pool" | 23:03 |
clarkb | ah ok. | 23:03 |
fungi | (where allocation within the pool is left up to the dhcpd) | 23:04 |
fungi | sounds like neutron is responsible for tracking allocations and just tells the dhcpd what's been assigned instead | 23:04 |
ianw | i think we should probably convert the journal to export from the start of boot, not from the time devstack started | 23:14 |
clarkb | ianw: I think devstack does that since some people reuse the nodes for CI | 23:15 |
clarkb | but in our case it would be fine | 23:15 |
ianw | at the moment we're flying blind, but i guess there's nothing obvious/systematic, at least right now | 23:16 |
ianw | i need to force merge the pypa project-config pipeline config | 23:38 |
ianw | so trying the instructions | 23:39 |
clarkb | ianw: why is that? | 23:39 |
clarkb | oh its a config project with no jobs | 23:39 |
clarkb | change adds the jobs | 23:39 |
ianw | chicken egg because there's no pipeline config to merge before there's a config :) | 23:39 |
ianw | "Members added to group Project Bootstrappers: n/a" | 23:39 |
fungi | i think that's normal if the account running the set-members command isn't itself a member | 23:40 |
clarkb | ianw: web ui says it added you | 23:40 |
ianw | yeah, i think red herring because the email is "n/a" | 23:41 |
fungi | aha | 23:41 |
fungi | that makes sense | 23:41 |
fungi | in a gerrit sort of way | 23:41 |
clarkb | oh that reminds me one thing I ran into when testing new gerrit is it really wants an email on accounts | 23:41 |
clarkb | so we may have to add email addrs to those accounts at some point | 23:41 |
fungi | also you can use the ls-members command to look at the list of group members, if needed | 23:42 |
clarkb | I had to set one to update the public key for the test project creator account | 23:42 |
clarkb | fungi: we should test if the upgrade will break our admin accounts without email addrs set | 23:42 |
clarkb | that is something we can test though | 23:42 |
fungi | that could pose problems since gerrit also doesn't like accounts to share e-mail addresses, so all admins will need two e-mail addresses | 23:42 |
clarkb | well we can also set it to a bogus value | 23:43 |
clarkb | we don't actually need the review emails | 23:43 |
clarkb | just need to convince it to not complan when setting things like public keys | 23:43 |
clarkb | (my worry is there is a chicken and egg where we might not be able to change things like that because we need the email field to have something in it) | 23:43 |
ianw | https://review.opendev.org/#/c/761681/ looks merged by ianw.admin, that's good | 23:43 |
fungi | but also having it try to e-mail bogus addresses could be problematic | 23:44 |
ianw | can probably use + addresses? | 23:44 |
ianw | to make something valid but different | 23:44 |
fungi | yeah, i mean it's no problem for me, i run my mailserver so i fan add whatever addresses i want | 23:44 |
clarkb | ya we'll sort it out on the test node | 23:44 |
clarkb | its possible its a non issue too | 23:44 |
ianw | https://zuul.opendev.org/t/pypa/status has all the pipelines | 23:46 |
ianw | and removed. ++ to fungi for great instructions | 23:47 |
fungi | +++ to gerrit's documentation | 23:48 |
*** tosky has quit IRC | 23:55 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!