fungi | group creation seems to be working in the wake of 715726 so https://review.opendev.org/#/admin/groups/openstack-tempest-skiplist-core exists now, but https://review.opendev.org/#/admin/projects/openstack/openstack-tempest-skiplist,access still hasn't gotten populated | 00:14 |
---|---|---|
fungi | this is our next exception: AttributeError: 'Gerrit' object has no attribute 'username' | 00:15 |
*** dangtrinhnt has joined #opendev | 00:41 | |
openstackgerrit | Merged openstack/diskimage-builder master: Add Fedora 31 support and test jobs https://review.opendev.org/708416 | 01:26 |
*** dangtrinhnt has quit IRC | 01:55 | |
*** dangtrinhnt has joined #opendev | 01:56 | |
*** dangtrinhnt has quit IRC | 01:57 | |
*** dangtrinhnt has joined #opendev | 02:04 | |
*** dangtrinhnt has quit IRC | 02:07 | |
*** dangtrinhnt_ has joined #opendev | 02:07 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: test-upload-logs-swift: revert download script https://review.opendev.org/715755 | 02:11 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: bulk-download : role with script to download all log files https://review.opendev.org/715756 | 02:11 |
*** dangtrinhnt_ has quit IRC | 03:37 | |
*** dangtrinhnt has joined #opendev | 03:50 | |
*** dangtrinhnt has quit IRC | 04:08 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 04:13 |
*** dangtrinhnt has joined #opendev | 04:24 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 04:44 |
*** ykarel|away is now known as ykarel | 04:50 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 04:51 |
*** dangtrinhnt has quit IRC | 04:58 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 05:00 |
*** ykarel is now known as ykarel|afk | 05:21 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 05:31 |
*** ykarel|afk is now known as ykarel | 05:40 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: local-log-download : role with script to download all log files https://review.opendev.org/715756 | 05:49 |
*** DSpider has joined #opendev | 05:53 | |
ianw | coruvs: ^ i think this is more what you were thinking? | 06:03 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/715772 | 06:08 |
*** dpawlik has joined #opendev | 06:23 | |
openstackgerrit | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/715772 | 06:37 |
*** tosky has joined #opendev | 07:25 | |
*** ysandeep|rover is now known as ysandeep|rover|l | 07:25 | |
*** rpittau|afk is now known as rpittau | 07:34 | |
*** ralonsoh has joined #opendev | 07:53 | |
*** ysandeep|rover|l is now known as ysandeep|rover | 08:34 | |
*** ykarel is now known as ykarel|lunch | 09:32 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow configure-mirrors to enable extra repos https://review.opendev.org/693887 | 09:40 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow configure-mirrors to enable extra repos https://review.opendev.org/693887 | 09:41 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Improve job and node information banner https://review.opendev.org/677971 | 09:47 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Avoid confusing rsync errors when source folders are missing https://review.opendev.org/670044 | 09:52 |
openstackgerrit | Sorin Sbarnea proposed openstack/diskimage-builder master: Validate virtualenv and pip https://review.opendev.org/707104 | 09:56 |
openstackgerrit | Sorin Sbarnea proposed openstack/diskimage-builder master: Validate virtualenv and pip https://review.opendev.org/707104 | 09:56 |
*** ykarel|lunch is now known as ykarel | 10:15 | |
*** rpittau is now known as rpittau|bbl | 10:56 | |
zbr | who can help me doing few abandons, like https://review.opendev.org/#/c/385217/ ? | 11:46 |
*** ysandeep|rover is now known as ysandeep|rover|b | 11:51 | |
*** lpetrut has joined #opendev | 12:02 | |
AJaeger | zbr: might be best if you give repository names. For elastic-recheck, I cannot help, for those that I'm core, I'm happy to... | 12:07 |
*** ysandeep|rover|b is now known as ysandeep|rover | 12:13 | |
openstackgerrit | Merged openstack/project-config master: Add Shrews to alumni https://review.opendev.org/715373 | 12:36 |
openstackgerrit | Merged openstack/project-config master: Replace python-charm-jobs to py3 job https://review.opendev.org/714796 | 12:38 |
*** rpittau|bbl is now known as rpittau | 12:46 | |
openstackgerrit | Grzegorz Grasza proposed openstack/project-config master: Add ability to push signed tags to tripleo-ipa https://review.opendev.org/715932 | 12:52 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: tox: allow tox to be upgraded https://review.opendev.org/690057 | 13:03 |
openstackgerrit | Monty Taylor proposed opendev/jeepyb master: Username is on the connection objet https://review.opendev.org/715937 | 13:04 |
mordred | fungi: ^^ | 13:04 |
mordred | fungi: I think that should fix the most recent issue | 13:04 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: install-docker: allow removal of conflicting packages https://review.opendev.org/702304 | 13:05 |
mordred | fungi, clarkb : I'm kind of confused as to why that just broke | 13:07 |
fungi | i'm also confused to how the acl eventually got applied | 13:08 |
fungi | maybe that's always been broken? | 13:08 |
mordred | I don't see any *recent* changes that would have impacted that - but maybe we just hadn't updated gerritlib in a long time or something | 13:08 |
fungi | oh, yeah, could be newer gerritlib | 13:08 |
mordred | fungi: I look forward to a future where we have manage-projects (or something) running as a zuul job and not as a cron pulse so that these logs can be more evident and visible to people like AJaeger and mnaser | 13:09 |
fungi | oh, indeed, the group got created with the project creator as a member | 13:11 |
fungi | i'll clean that up too | 13:11 |
mnaser | yep, that would be awesome mordred | 13:11 |
mordred | fungi: so maybe this has been broken for a while but we never noticed? and maybe project creator is a member of a bunch of groups and we didn't notice? :) | 13:13 |
fungi | mordred: oh, i think i see why... manage-projects ran and created the new group that new acl needed but raised AttributeError trying to clean up the initial group membership so we skipped the remainder of that project setup, then on the next go round it saw the group already existed so didn't try to create it and just pushed the new acl | 13:14 |
mordred | fungi: yay for eventual consistency | 13:14 |
fungi | mordred: so... this is the only match on AttributeError in any manage_projects.log for the past month's retention | 13:17 |
fungi | leading me to suspect the new gerritlib theory is correct | 13:17 |
mordred | nod | 13:17 |
mordred | we made a new gerritlib release in jan - but I think it had been a _while_ | 13:18 |
mordred | yeah - 2018 was the previous on | 13:18 |
mordred | one | 13:18 |
mordred | it's entirely possible we haven't added any new groups between jan 28 and now | 13:19 |
mordred | and yes - I have confirmed - the username change happened between 0.8.1 and 0.8.2 | 13:20 |
fungi | possible puppet didn't upgrade gerritlib on review.o.o when we tagged a new release? | 13:23 |
fungi | and we were still continuing to run much older? | 13:23 |
fungi | system context pip says we've still got 0.8.1 installed | 13:24 |
fungi | so maybe we've only run 0.8.2 from docker | 13:24 |
mordred | this is a very good possibility | 13:25 |
fungi | which may explain a bunch of these behavior changes | 13:26 |
fungi | once 715937 merges we can try another project creation change and see if it makes it all the way through without error | 13:35 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add job to run manage-projects in zuul https://review.opendev.org/715944 | 13:42 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: install-docker: allow removal of conflicting packages https://review.opendev.org/702304 | 13:45 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Run manage-projects on gerrit related changes https://review.opendev.org/715945 | 13:46 |
mordred | fungi: ^^ there we go - I think those two patches shoudl do the manage-projects run, yes? | 13:47 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Adds roles to install and run hashicorp packer https://review.opendev.org/709292 | 14:11 |
*** ysandeep|rover is now known as ysandeep|away | 14:12 | |
*** lpetrut has quit IRC | 14:13 | |
*** lpetrut has joined #opendev | 14:15 | |
*** ykarel is now known as ykarel|away | 14:25 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run manage-projects/base/bridge on system-config changes https://review.opendev.org/715957 | 14:31 |
openstackgerrit | Dmitriy Rabotyagov (noonedeadpunk) proposed opendev/lodgeit master: Add lodgeit-db script https://review.opendev.org/714732 | 14:31 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Log manage-projects to stdout https://review.opendev.org/715964 | 14:34 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow configure-mirrors to enable extra repos https://review.opendev.org/693887 | 14:37 |
openstackgerrit | Thierry Carrez proposed opendev/system-config master: [gitea] Point to newly-split Getting Started content https://review.opendev.org/715966 | 14:42 |
mordred | ttx: I feel like we just landed a patch that did that ... | 14:46 |
mordred | ttx: oh - maybe if I actually read your commit message | 14:46 |
mordred | infra-root: I made this: https://review.opendev.org/#/q/topic:zuul-manage-projects which I *think* we're actually ready for - but that's obviously a big step, so is worth extra eyeballs | 14:49 |
AJaeger | mordred: I left a question on the project-config one, there's a slight misuse of promote due to the way we characterized promote. Are we fine with that? | 14:52 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Add a service discussion mailing list for OpenDev https://review.opendev.org/715972 | 14:53 |
openstackgerrit | Merged opendev/jeepyb master: Username is on the connection objet https://review.opendev.org/715937 | 14:53 |
mordred | AJaeger: yeah- I think in this case it's still decently relevant - it's a thing running after changes merge - but I agree, there's no artifacts to promote in the strict sense. The important bit for promote (other thant he files matcher that you mentioned) - is that it's supercedent so we don't wind up running manage-projects 4 times if we land 4 changes but rather only once | 14:54 |
mordred | since step 1 of the manage-projects playbook is "update project-config" | 14:55 |
ttx | mordred: yeah for some reason the gerritbot did not post the infra-manual change here | 14:55 |
mordred | AJaeger: I say that ... but actually it doesn't - I should add that | 14:57 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add job to run manage-projects in zuul https://review.opendev.org/715944 | 15:00 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run manage-projects/base/bridge on system-config changes https://review.opendev.org/715957 | 15:00 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Log manage-projects to stdout https://review.opendev.org/715964 | 15:00 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Update project-config in manage-projects https://review.opendev.org/715976 | 15:00 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: install-docker: allow removal of conflicting packages https://review.opendev.org/702304 | 15:00 |
*** DSpider has quit IRC | 15:07 | |
*** DSpider has joined #opendev | 15:07 | |
*** yoctozepto has quit IRC | 15:07 | |
*** yoctozepto has joined #opendev | 15:08 | |
*** osmanlicilegi has quit IRC | 15:08 | |
*** osmanlicilegi has joined #opendev | 15:10 | |
clarkb | mordred: if you have a moment https://review.opendev.org/#/c/715961/1 is the dependency of ttx's change you alredy reviewed | 15:20 |
mordred | clarkb: done | 15:22 |
AJaeger | ttx, we have not updated gerritbot config for a bit, this is part of the larger system-config changes going on. | 15:30 |
clarkb | mordred: fungi is there a quick jump in point for helping with the jeeypb stuff? | 15:35 |
clarkb | I'm no longer up todate | 15:35 |
fungi | clarkb: i think we're ready to approve some more project creation changes at this point | 15:35 |
mordred | yeah. the things have all landed | 15:35 |
clarkb | great, maybe that means i should make tea :) | 15:36 |
fungi | clarkb: short summary, we've been running with "old" gerritlib on the gerrit server up to now, and the container is getting "newer" gerritlib, hence all the behavior changes we've observed and patched manage-projects for | 15:36 |
mordred | clarkb: that said - I added a new stack you could look at: https://review.opendev.org/#/q/topic:zuul-manage-projects | 15:36 |
AJaeger | mordred, fungi , https://review.opendev.org/704133 https://review.opendev.org/713650 are two repo creation changes | 15:36 |
clarkb | fungi: up to now meaning before we started usign the container? | 15:36 |
fungi | clarkb: right | 15:37 |
fungi | docker is pulling latest gerritlib | 15:37 |
AJaeger | config-core, please review https://review.opendev.org/715755 and https://review.opendev.org/715735 as two reverts | 15:37 |
fungi | clarkb: but puppet has not been upgrading gerritlib | 15:37 |
mordred | clarkb: puppet did not seem to update gerritlib | 15:37 |
mordred | yeah | 15:37 |
mordred | so at least this most recent thing actually broke back in january - but we never noticed :) | 15:37 |
clarkb | gotcha | 15:38 |
fungi | or longer ago than january, but that's when the release containing it was tagged | 15:40 |
*** rajinir has quit IRC | 15:45 | |
*** ttx has quit IRC | 15:45 | |
*** rpittau has quit IRC | 15:45 | |
*** rpittau has joined #opendev | 15:45 | |
*** rajinir has joined #opendev | 15:45 | |
*** ttx has joined #opendev | 15:45 | |
clarkb | mordred: https://review.opendev.org/#/c/715944/2/.zuul.yaml that won't work as is. You need a non root user on the server with sudo access that zuul can ssh in as | 15:48 |
mordred | clarkb: that's already set up in the base playbook | 15:50 |
mordred | clarkb: it's the zuulcd user | 15:51 |
mordred | and it seems to have the zuul ssh deployment keys in its authorized_keys already | 15:51 |
clarkb | mordred: thats only for bridge | 15:51 |
clarkb | and I don't know that it landed? | 15:51 |
clarkb | mordred: the idea there was to have zuul log in to bridge then run a nested ansible from there | 15:51 |
*** hashar has joined #opendev | 15:51 | |
mordred | clarkb: it only needs to be for bridge | 15:51 |
mordred | right. that's what that job does | 15:51 |
clarkb | if I'm reading your job correctly its talking directly to review* | 15:51 |
mordred | clarkb: playbooks/zuul/run-production-playbook.yaml | 15:51 |
clarkb | oh hrm, ok I'm not sure how far along that got it sort of died when I couldn't get the zuul user landed iirc | 15:52 |
mordred | clarkb: infra-prod-playbook logs in to bridge as zuulcd and runs the named playbook | 15:52 |
mordred | the zuul user is landed and in place - it SEEMS like all of the pieces are there | 15:52 |
clarkb | ok so maybe that eventually caught up and I didn't notice (it sat stale for a long time) | 15:53 |
clarkb | I'm not sure how much testing this has received in general | 15:53 |
clarkb | but I guess we can try it wiht manage projets | 15:53 |
mordred | :) | 15:54 |
clarkb | next up https://review.opendev.org/#/c/715957/2/.zuul.yaml we actually need to land that at the same time we land any manage-project changes right? otherwise we'll try to replicate to locations that don't exist | 15:55 |
clarkb | and I think that needs strict ordering of jobs | 15:55 |
clarkb | hrm I think we are using artifact dependencies rather than job dependencies? | 15:56 |
mordred | aroo? | 15:57 |
clarkb | mordred: but the first thing I was concerend about is that we need gitea to run before manage-projects | 15:57 |
clarkb | and I don't think we've done that in the change there | 15:57 |
mordred | we don't need to do that actually | 15:57 |
clarkb | oh wait its all in the one playbook | 15:57 |
mordred | yeah | 15:57 |
clarkb | (but the diff context isn't default big enough to show that) | 15:57 |
clarkb | ok so that first concern is good | 15:57 |
mordred | we _do_ need to run the update system-config ... so we should probably squash that one with the one before | 15:57 |
clarkb | as this scope grows maybe its a good idea to land a simple change that runs ansible via bridge | 15:58 |
mordred | I do think that for completeness we should eventually put in gitea and gerrit service playbooks and put in soft-depends between manage-projects and them - in case we land a change that touches both things | 15:58 |
clarkb | ensure that all works before we throw gitea and gerrit and bridge at it | 15:59 |
mordred | yeah | 15:59 |
clarkb | oh and base which is all the servers | 15:59 |
mordred | well - that was why I had the first manage-projects change like it is | 15:59 |
mordred | yup. that's the second change :) | 15:59 |
* mordred has a call ... will be back in about 30 | 15:59 | |
clarkb | really I'm thinking it would be good to take small bites and get manage-projects running successfully and solve that problem. Then tackle the problem of driving from zuul so that we don't prlong the first thing as we debug the second | 16:00 |
clarkb | and the original work in that space tried to start small (run dns updates iirc) | 16:00 |
mordred | totally. it's just that I thnk manage-projects is now fixed :) | 16:01 |
clarkb | but we've sort ofl eaped ahead to "run base on all servers and set up gitea and gerrit and bridge) | 16:01 |
clarkb | ah | 16:01 |
mordred | so this is the next step | 16:01 |
mordred | and step one in the stack is just running manage-projects | 16:01 |
mordred | so I agree with you | 16:01 |
clarkb | ya except its also hitting gitea too | 16:02 |
clarkb | mordred: https://review.opendev.org/#/c/651390/4/playbooks/zuul_manage_nameservers.yaml thats the playbook we never ran on the nameservers | 16:07 |
clarkb | but the chagne itself died beacuse it needed to be triggered from the zone repos and that made it more complicated | 16:09 |
clarkb | maybe having a simple start like that would be worthwile though since we never got that far in the past? | 16:09 |
*** rpittau is now known as rpittau|afk | 16:12 | |
AJaeger | mordred: do we have a todo list? It should include gerribot as well... | 16:13 |
clarkb | hrm that said we have learned some things about this via the goaccess playbook | 16:30 |
clarkb | top of list is we don't run a zuul console logger daemon thing which causes confusion (but in this case may actually be desireable) | 16:30 |
corvus | goaccess? | 16:30 |
clarkb | corvus: the webserver log stats reporter tool | 16:30 |
clarkb | corvus: it doesn't bounce through bridge though | 16:31 |
corvus | ah | 16:31 |
mordred | clarkb: yeah - I mean, this should work pretty much like goaccess - it's a no-node job on bridge that adds the remote host with add_host | 16:40 |
mordred | so the console logging should work from the zuul POV about the same, yeah? | 16:41 |
mordred | we might not get live streaming - but we still should get the final logs | 16:41 |
clarkb | mordred: I'd have to double check the goaccess logfile but I think we domt get anything logged | 16:42 |
mordred | clarkb: also - what I meant above by step one being just running manage-projects is that it's a pretty self-contained payload - that the manage-projects playbook itself talks to 10 hosts isn't really super important from a mechanism perspective, right? | 16:42 |
clarkb | mordred: it wasmostly my oncern we hadnt gotten a debug output playbook to work at all so this is a big jump but then I rememberes gpaccess | 16:43 |
mordred | yeah. that was my thinking - it's working for goaccess which is structurally the same even if it's touching a different jump host | 16:43 |
mordred | if we get _nothing_ logged then we might want to update the run playbook to redirect the ansible-playbook stdout to a file then grab that file as a logfile to upload | 16:44 |
clarkb | mordred: http://zuul.openstack.org/build/7954d3813b8842869af81b1ee8d82dad/log/job-output.txt#76 ya I don't think we ever get logs | 16:59 |
clarkb | so maybe that is the only thing we should teark | 17:00 |
clarkb | *tweak | 17:00 |
clarkb | to start at least its kinda nice to have the logs hidden from zuul, then we can add them in if we decide they aren't leaking things | 17:20 |
clarkb | fungi: https://review.opendev.org/#/c/715555/ thats a gerritlib fix we mirroed in jeepyb | 17:21 |
clarkb | fungi: but if we can land that change then we can make a release and remove the related cleanup in jeepyb | 17:21 |
fungi | approved it. i was debating also solving https://review.opendev.org/715726 in gerritlib, though am on the fence as to whether that's just a desirable behavior change vs a regression | 17:23 |
clarkb | thanks, in that case I should probably make a release as soon as that lands and not wait for additional fixes? | 17:27 |
fungi | yeah, a reasonable choice | 17:29 |
fungi | though i can whip up a change for the regression we worked around with 715726 if you think it's worth patching in gerritlib | 17:30 |
clarkb | fungi: do you have the traceback handy? | 17:31 |
clarkb | I assume it failed in _ssh()? | 17:31 |
fungi | clarkb: http://paste.openstack.org/show/791352/ | 17:34 |
fungi | yeah, in _ssh() | 17:34 |
openstackgerrit | Merged opendev/gerritlib master: Return lists from listing functions https://review.opendev.org/715555 | 17:35 |
fungi | i don't really have a preference when it comes to testing for nonetype vs catching an exception in manage-projects | 17:37 |
clarkb | fungi: some of this is from memory so may not be entirely correct. jeepyb in the old setup was using a db lookup of groups which would have different failure modes than the ssh api lookup. Rereading gerritlib I'm not entirely sure this is a regression there (as it would've returend an error if gerrit didn't exit 0 on the ssh command) and instead we needed to properly catch the different case in jeepyb | 17:37 |
clarkb | that said I could see an argument that a better behavior for listGroup() would be to return [] if there were no matches rather than raising | 17:38 |
*** diablo_rojo has joined #opendev | 17:38 | |
fungi | right, that's why i didn't ultimately also push up a change for gerritlib on that one | 17:39 |
*** lpetrut has quit IRC | 17:40 | |
clarkb | for now maybe its best to leave the gerritlib behavior stable in case anyone else is using it and checking for exceptions already | 17:40 |
fungi | wfm | 17:41 |
clarkb | mordred: maybe we should log to bridge in /var/log/ansible? then if we vet the output add it to the job as a logfile? | 17:47 |
clarkb | mordred: we sort of did similar with goaccess where we had it write the html report to disk but then didn't collcet it to start, we reviewed the output didn't disclose anything extra, then added it as a log file | 17:47 |
openstackgerrit | Merged opendev/system-config master: [gitea] Point to newly-split Getting Started content https://review.opendev.org/715966 | 17:50 |
clarkb | ttx: ^ thank you for that | 17:50 |
clarkb | fungi: I'm looking for my keyachain now in order to sign a tag | 17:53 |
clarkb | will get that pushed up asoon as I've found it | 17:53 |
fungi | cool | 17:54 |
fungi | or i can do it if you prefer | 17:54 |
*** diablo_rojo has quit IRC | 17:55 | |
clarkb | nah I've found it :) | 17:55 |
clarkb | it was where I left it on the night stand | 17:55 |
*** diablo_rojo has joined #opendev | 17:55 | |
clarkb | 0.8.4 is what I'll be tagging | 17:55 |
fungi | yeah, seems right to me | 17:56 |
clarkb | and pushed | 17:57 |
fungi | 0.8.3 was the last tag, and only bug fixes since | 17:57 |
clarkb | fungi: mordred looking at project-config it appears we have a few project creation changes | 18:06 |
clarkb | I heard we think things are good to go now, should we be landing those? | 18:07 |
clarkb | I'll set topic:new-project on them | 18:07 |
fungi | yeah, maybe let's approve one and make sure it goes through without error, then do the rest in bulk | 18:09 |
clarkb | I think topic:new-project has the list | 18:09 |
fungi | thanks | 18:09 |
clarkb | mnaser's would be a good candidate btu it is in merge conflict | 18:09 |
clarkb | https://review.opendev.org/#/c/714965/1 | 18:10 |
clarkb | mnaser: ^ want ot update that one really quickly with a rebase? or should we? | 18:10 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Replace incident channel with opendev-meeting https://review.opendev.org/716038 | 18:10 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Replace incident channel with opendev-meeting https://review.opendev.org/716039 | 18:10 |
mnaser | clarkb: i can rebase it if you need me to | 18:10 |
clarkb | mnaser: ya I think it needs one to merge according to gerrit | 18:11 |
openstackgerrit | Mohammed Naser proposed openstack/project-config master: vexxhost: add repos for exporters https://review.opendev.org/714965 | 18:14 |
openstackgerrit | Merged zuul/zuul-jobs master: test-upload-logs-swift: revert download script https://review.opendev.org/715755 | 18:14 |
openstackgerrit | Merged opendev/base-jobs master: Revert "virtualenv-config: add to base pre playbook" https://review.opendev.org/715735 | 18:14 |
mnaser | clarkb: ^ :) | 18:15 |
clarkb | mnaser: thank you +2'd | 18:15 |
clarkb | fungi: mordred maybe you can rereview https://review.opendev.org/714965 and that will be our canary? | 18:15 |
*** ralonsoh has quit IRC | 18:17 | |
fungi | approved, though it won't test the acl and group creation bits which are where we ran into our most recent errors | 18:26 |
clarkb | oh good point maybe find another canary thne | 18:28 |
clarkb | https://review.opendev.org/#/c/714686/1 what about that one | 18:28 |
fungi | yep, +2 from me | 18:29 |
fungi | if there's one with an "upstream" import url, that might be a good test too | 18:29 |
mordred | clarkb: I think that's a good idea (log to log file then manually vet the output) | 18:39 |
mordred | clarkb: those both look good | 18:40 |
openstackgerrit | Merged openstack/project-config master: Add nginx-ingress-controller armada app to StarlinX https://review.opendev.org/714686 | 18:47 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add job to run manage-projects in zuul https://review.opendev.org/715944 | 19:02 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Log manage-projects to stdout https://review.opendev.org/715964 | 19:02 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run manage-projects/base/bridge on system-config changes https://review.opendev.org/715957 | 19:02 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Parameterize manage-projects logging output https://review.opendev.org/716052 | 19:02 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Redirect production playbook output https://review.opendev.org/716053 | 19:02 |
mordred | clarkb: ^^ I think that captures a set of safe steps | 19:02 |
mordred | clarkb: I think we can (and should) land the first already. the second should also be safe to land and should be a no-op | 19:04 |
mordred | third is definitely no-op since it's to a playbook that isn't in use :) | 19:05 |
mordred | the fourth would trigger manage-projects from zuul - but still logging to the log file on gerrit - then the fifth would log the ansible output to a file on bridge (so we can see that it's useful output) | 19:05 |
mordred | corvus, fungi : ^^ | 19:06 |
clarkb | mordred: k reviewing in a moment, fnishing early lunch | 19:15 |
openstackgerrit | Merged openstack/project-config master: vexxhost: add repos for exporters https://review.opendev.org/714965 | 19:15 |
AJaeger | infra-root, according to grafana, the logstash queue is linearly increasing - is that normal? now over 15k in the queue | 19:29 |
clarkb | AJaeger: I checked it earlier today as a followup to friday and it was at 5k. I expect we've added a bunch of logs to process in some jobs 9possibly via large console logs or somewhere else) | 19:31 |
clarkb | AJaeger: I think the thing to check is if it goes back down after zuul load subsides. If it doesn't then we aren't keeping up at all and we need to identify where the bloat is coming from | 19:31 |
AJaeger | clarkb: I see, thanks | 19:33 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Log manage-projects to stdout https://review.opendev.org/715964 | 19:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run manage-projects/base/bridge on system-config changes https://review.opendev.org/715957 | 19:44 |
clarkb | mordred: in https://review.opendev.org/#/c/716053/1 don't we want to redirect to a file? but that change sets it to false | 19:44 |
clarkb | oh I see the next change overrides in the child job | 19:44 |
mordred | yeah - I was trying to get it set up into small discreet chunks :) | 19:44 |
clarkb | mordred: I've approved the first in the stack (update of git repo) | 19:46 |
clarkb | the others to the point of logging locally lgtm. but I did leave a comment on the logging one | 19:47 |
clarkb | which you've seen so yay | 19:47 |
fungi | our canary changes 714686 and 714965 seem to have been processed with no exceptions raised | 19:48 |
clarkb | mordred: https://review.opendev.org/#/c/715964/4 has a note too | 19:48 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Redirect production playbook output https://review.opendev.org/716053 | 19:48 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add job to run manage-projects in zuul https://review.opendev.org/715944 | 19:48 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Log manage-projects to stdout https://review.opendev.org/715964 | 19:48 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run manage-projects/base/bridge on system-config changes https://review.opendev.org/715957 | 19:48 |
mordred | clarkb: I added the date time | 19:48 |
clarkb | mordred: looking | 19:48 |
*** mugsie has quit IRC | 19:51 | |
*** mugsie has joined #opendev | 19:54 | |
mordred | corvus: I updated https://review.opendev.org/#/q/topic:zuul-manage-projects a bit based on review from clarkb if you have a sec to re-review | 19:58 |
*** dpawlik has quit IRC | 19:59 | |
clarkb | AJaeger: fungi: mordred should we start landing more of those topic:new-project changes? | 20:02 |
clarkb | (I'm happy to help review them but have been taking my cues from yall on readyness) | 20:02 |
*** dpawlik has joined #opendev | 20:08 | |
fungi | yeah, i can take a look in a moment | 20:08 |
fungi | https://review.opendev.org/#/admin/groups/starlingx-nginx-ingress-controller-armada-app-core did not wind up with the project creator user left in it | 20:10 |
fungi | and the https://review.opendev.org/#/admin/projects/starlingx/nginx-ingress-controller-armada-app,access acl got created correctly, looks like | 20:10 |
mordred | clarkb: yes - but... | 20:11 |
mordred | clarkb, fungi : let's keep a few in our pocket so we can use them to test the zuul triggered ones | 20:11 |
fungi | and the initial repo state got created successfully in https://opendev.org/starlingx/nginx-ingress-controller-armada-app with a .gitreview file containing the correct bits | 20:12 |
mordred | \o/ | 20:12 |
fungi | so i think the only thing we haven't really tested yet is repository importing from an "upstream" url | 20:12 |
*** dpawlik has quit IRC | 20:13 | |
corvus | mordred: note from clarkb on https://review.opendev.org/715964 | 20:16 |
corvus | mordred: and ansible bug on https://review.opendev.org/716053 | 20:18 |
clarkb | corvus: is that true when the first character of the string isn't { ? | 20:25 |
clarkb | re the ansible bug | 20:25 |
corvus | well, the linter seems to be complaining | 20:29 |
clarkb | probably best to be safe there then | 20:29 |
AJaeger | fungi: didn't we test import yesterday? | 20:30 |
corvus | i'm happy with whatever the linter is (it's a yaml parse error, not some finicky thing). but that's what i'd do | 20:30 |
AJaeger | fungi: Id9648164023590a440c56906ecd982523b176179 has upstream | 20:31 |
fungi | yeah, we just haven't tested it in the context of an error-free run, but probably good enough | 20:32 |
AJaeger | speaking about new repos, do we have rules on what to take into opendev - and would we "adopt" https://review.opendev.org/704411 (if it passes)? | 20:33 |
* AJaeger will read backscroll tomorrow and waves good night | 20:35 | |
corvus | that seems like a good meeting topic | 20:36 |
mordred | corvus, clarkb : I think on clarkb's comment - we very well might need to log to a file and attach ... but I think we might also just get what we need from -v since there is output - and think it's worth trying? or shoudl we just go ahead and make a tmp file and log to it and then copy it | 20:36 |
clarkb | AJaeger: zbr I don't think we should fork pre-commit | 20:36 |
corvus | mordred: iiuc, the issue is that that output is going to the inner ansible running on bridge? | 20:36 |
corvus | so if that works, aren't we just going to see "ok: 1" ? | 20:37 |
corvus | or "changed: 1" or whatever | 20:37 |
fungi | clarkb: i'm assuming the reason is that pre-commit wants to clone everything over the network from github, and is unwilling to entertain fixes for that (because they don't consider it a problem)? | 20:37 |
fungi | but yes, carrying a fork of it to patch around that also seems questionable | 20:38 |
clarkb | fungi: ya if openstack or whoever wants to do that its up to them, but I don't think we need to provide that as part of the opendev service | 20:39 |
*** njohnston is now known as njohnston_ | 20:39 | |
fungi | maybe it's something the openstack qa team wants to consuder | 20:40 |
fungi | consider | 20:40 |
clarkb | mordred: corvus http://zuul.openstack.org/build/7954d3813b8842869af81b1ee8d82dad/log/job-output.txt#76 that is what it looks like for goaccess | 20:42 |
clarkb | mordred: corvus I think we'll get similar here, it will just be no logger found until the end of the playbook | 20:42 |
mordred | corvus, clarkb: yes. I agree :) | 20:43 |
corvus | (we could, incidentally, open the streaming port in the firewall on bridge and static and get streaming console logs, but that's a separate issue) | 20:44 |
clarkb | the upside to having the log as an artifact is it ensures we log it on the host too | 20:44 |
fungi | okay i've approved a few more topic:new-project changes | 20:44 |
clarkb | (whcih I think would be nice) | 20:45 |
clarkb | we can still dump to console log and to disk though, one doesn't imply the other isn't happening | 20:45 |
corvus | clarkb: the goaccess/static situation is still different though, that's a single ansible run, not nested, right? | 20:45 |
clarkb | corvus: correct, but its at the top layer which we have to pass through when nested | 20:45 |
corvus | clarkb: because zuul *does* eventually see the output from the run: http://zuul.openstack.org/build/7954d3813b8842869af81b1ee8d82dad/console | 20:46 |
clarkb | corvus: it sees the ansible backend stuff but not the console log | 20:46 |
clarkb | in our case because its nested ansible that means we'll get effectively nothing | 20:46 |
*** njohnston_ has quit IRC | 20:47 | |
corvus | clarkb: to be precise (sorry, this helps me make sure we're talking about the same thing): in the goaccess case, zuul is running the playbook directly, so the zuul output json is exactly as normal for any zuul job. but the streaming console log and text file is missing the output from shell commands because the log streamer daemon is firewalled. | 20:48 |
openstackgerrit | Merged opendev/system-config master: Update project-config in manage-projects https://review.opendev.org/715976 | 20:48 |
corvus | clarkb: in the prod playbook case, it's similar, except that the 'shell' command in this case is nested ansible. so in our json file, we will get a little bit of ansible boilerplate and ending with "changed: 1", but similarly no ansible output in the text or streaming logs. | 20:49 |
*** hashar has quit IRC | 20:49 | |
clarkb | yup | 20:49 |
corvus | kk | 20:49 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Collect production playbook output https://review.opendev.org/716053 | 20:51 |
*** njohnston has joined #opendev | 20:51 | |
mordred | corvus, clarkb : ^^ I think that should do the thing we want - yes? log to a file, then collect the file | 20:51 |
mordred | one sec - typo | 20:52 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Collect production playbook output https://review.opendev.org/716053 | 20:52 |
mordred | also - should we add a -v to the ansible-playbook invocation so that we get stdout for success lines? | 20:53 |
clarkb | yes that should do what we want | 20:53 |
mordred | clarkb: want me to add the -v? | 20:53 |
clarkb | (and we'll have it on the server in the normal location too for people that will keep looking there as we transition) | 20:53 |
mordred | yeah | 20:54 |
clarkb | mordred: that distinction is a bit fuzzy to me. With -v we get the stdout of successful tasks? but without it will just say change: true or whatever? | 20:54 |
mordred | yeah. | 20:54 |
clarkb | is there any concern we'll leak things we shouldnt' that way? | 20:54 |
clarkb | otherwise its probably fine | 20:54 |
mordred | I don't think so - no | 20:55 |
mordred | I mean - the output will be the stdout of manage-projects which I think is actually pretty solid | 20:55 |
clarkb | in the case here it would probably be gitea admin credentials since everything else uses ssh keys | 20:55 |
clarkb | ya manage-projects should be fine since its using an ssh key | 20:55 |
clarkb | but the gitea side maybe? and also this is the default in that base job which might catch more things over time | 20:56 |
openstackgerrit | Merged openstack/project-config master: Add xstatic-** projects for vitrage-dashboard https://review.opendev.org/704133 | 20:56 |
mordred | well - we'll start with collect_logs false - so we can verify that | 20:56 |
mordred | when we add new things | 20:56 |
clarkb | k | 20:56 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Collect production playbook output https://review.opendev.org/716053 | 20:57 |
mordred | clarkb, corvus : now with -v | 20:57 |
clarkb | fwiw ansible verbosity has always confused me | 20:57 |
clarkb | like to get a traceback to understand why it failed you need -vvvvv but to leak secret data it seems to just happen :) | 20:57 |
clarkb | (hence the explicit no_log: true things we do) | 20:57 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add job to run manage-projects in zuul https://review.opendev.org/715944 | 20:58 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Log manage-projects to stdout https://review.opendev.org/715964 | 20:58 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run manage-projects/base/bridge on system-config changes https://review.opendev.org/715957 | 20:58 |
mordred | clarkb: yeah | 20:58 |
*** mlavalle has joined #opendev | 21:05 | |
corvus | one of the advantages of using ansible modules for things is that you can relax no_log. for instance, a module that takes a password parameter will automatically mask that in logs. but that's extra effort; most of our secret use is just shell tasks | 21:06 |
corvus | but if we find some specific thing we want to pull out of no_log in this effort, we might look into that as an option | 21:06 |
clarkb | fungi: should we try to use opendev-meeting tomorrow or is better in a week? | 21:13 |
* clarkb is putting together agenda email and want to get location correct | 21:13 | |
openstackgerrit | Merged openstack/project-config master: Add Rook to StarlingX https://review.opendev.org/713650 | 21:13 |
openstackgerrit | Merged openstack/project-config master: Add Cert-Manager Armada app to StarlingX https://review.opendev.org/714689 | 21:13 |
fungi | clarkb: i would say we should agree on it in tomorrow's meeting and announce in the meeting that we're moving to the new channel next week? | 21:14 |
fungi | just for maximum possible continuity | 21:14 |
mordred | clarkb, corvus : I'm landing the first two of those (which should not have any noticable impact) | 21:14 |
clarkb | fungi: wfm | 21:14 |
fungi | clarkb: maybe also decide on the new ml at the same-ish time | 21:14 |
clarkb | ++ | 21:15 |
fungi | (and announce similarly in meeting and then on the old ml | 21:15 |
fungi | announcement to the old ml can similarly mention the change in meeting venue | 21:16 |
clarkb | fungi: did you settle on a name for the new list? service-discuss@lists.opendev.org? (I'll put notes in the agenda too, but for tomorrow will be business as usual) | 21:17 |
fungi | yep, that's what's in the change anyway | 21:18 |
clarkb | great | 21:18 |
fungi | it's available for debate | 21:18 |
clarkb | I think its fine :) | 21:18 |
fungi | but that's what seemed consensual in last week's meeting so it's what i went with | 21:18 |
clarkb | we will have an "OpenDev" heavy agenda which is probably a good thing (we are headed in the right direction) | 21:20 |
fungi | and the following meeting in #opendev-meeting might be an entirely opendev agenda (by definition!) ;) | 21:21 |
openstackgerrit | Merged zuul/zuul-jobs master: Improve job and node information banner https://review.opendev.org/677971 | 21:37 |
clarkb | fungi: the irc channel changes and the new list change lgtm. topic:opendev-comms if others want toreview too. | 21:40 |
* fungi hopes others will review, at least | 21:41 | |
fungi | when the ml one merges we'll want to remind everyone to subscribe and then update references in lots of places | 21:41 |
fungi | i'm happy to volunteer to serve as a list moderator since i already check moderation queues on openstack-infra ml daily | 21:42 |
fungi | also when the irc change lands we'll want to propose an irc-meetings change | 21:42 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: run_functests: handle build without tar https://review.opendev.org/715098 | 21:58 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: centos 8 image build: fix mirror https://review.opendev.org/714836 | 21:58 |
*** DSpider has quit IRC | 21:59 | |
openstackgerrit | Merged opendev/system-config master: Parameterize manage-projects logging output https://review.opendev.org/716052 | 21:59 |
openstackgerrit | Merged opendev/system-config master: Remove /tarballs proxy from mirrors https://review.opendev.org/714544 | 21:59 |
openstackgerrit | Merged opendev/system-config master: Collect production playbook output https://review.opendev.org/716053 | 21:59 |
ianw | mordred: are you around to discuss your thoughts on container upgrade procedures? | 22:00 |
mordred | ianw: sure! | 22:02 |
mordred | I may or may not have useful things to say :) | 22:02 |
mordred | ianw: (also - check it out - we've got a stack moving towards zuul run ansible!) | 22:03 |
ianw | so will https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/nodepool-builder/tasks/main.yaml#L35 just restart with the latest image by pulling it each ansible pulse? | 22:03 |
fungi | i thought the idea of upgrades in the container ecosystem was that you burn it all down, salt the earth, and move on | 22:04 |
clarkb | ianw: its the next task that will do it | 22:04 |
mordred | yeah - the docker-compose up -d command | 22:04 |
clarkb | ianw: the pull will just update the images, but not restart any containers, but running docker-compose up next though it will update any running containers to the latest images | 22:04 |
ianw | right, so the up is expected to re-up if the pull pulled a new image? | 22:04 |
clarkb | ianw: yes | 22:04 |
clarkb | it will noop if no image updated | 22:04 |
corvus | fungi: that is more or less what mordred and ianw are talking about does :) | 22:04 |
mordred | corvus: mmm. salt | 22:05 |
fungi | corvus: i figured. you blow away the container and deploy a new one? | 22:05 |
mordred | yup | 22:05 |
corvus | fungi: yeah; since we're bind mounting all our data in, that sticks around | 22:05 |
corvus | so, i mean, don't salt *that* earth | 22:05 |
mordred | no - that woudl be bad salted earth | 22:06 |
corvus | but this earth over here is okay | 22:06 |
ianw | yeah, i thought so ... since so much state is outside on the builder i'm thinking it might need ... something else | 22:06 |
fungi | data migrations | 22:06 |
mordred | ianw: like what? | 22:07 |
clarkb | ianw: mordred: we could continue to run it as we do today and only restart through manual intervention | 22:07 |
clarkb | have ansible do the pulls to keep us up to date maybe? | 22:07 |
ianw | mordred: yeah, like what was what i'm trying to figure out :) | 22:07 |
fungi | if we're talking about gerrit, it does the usual rdbms tactic of recording a schema version and running migrations if it's restarted with code which expects a newer schema | 22:07 |
mordred | fungi: actually - not quite | 22:07 |
fungi | oh? | 22:08 |
mordred | with gerrit we have to run gerrit init to run the migrations | 22:08 |
fungi | it used to... | 22:08 |
mordred | so there is a special upgrade step | 22:08 |
ianw | is gerrit ok with being sort of arbitrarily killed and started like that, in term of in-flight db stuff, etc? | 22:08 |
mordred | yeah - the init script runs init every time | 22:08 |
mordred | but it's not actually necessary | 22:08 |
fungi | oh, right, init is supposed to be a no-op if the schema version is already at the expected level | 22:08 |
mordred | well - with _gerrit_ we're not planning on having ansible restart it | 22:08 |
mordred | because gerrit | 22:08 |
mordred | fungi: yeah - also - for non-container, gerrit init expands the war plugins and downloads the db connector jar | 22:08 |
mordred | but we do that at container build time now | 22:09 |
mordred | but - gerrit is currently special | 22:09 |
ianw | yeah ... i guess that's also "because dib" ... i'm thinking ahead a bit for the launchers too | 22:09 |
mordred | we don't have ansible run docker-compose up for it unless we set a flag - which is normally unset | 22:09 |
corvus | i thought dib could be stopped via signal? | 22:10 |
ianw | both i think we want to have the ability to stop what they're doing? | 22:10 |
openstackgerrit | Merged openstack/diskimage-builder master: Mellanox element: removed ibutils,libibcm,libmlx4-dev https://review.opendev.org/714336 | 22:10 |
ianw | corvus: yeah, at least signalling so it can try and run it's cleanup would be a start | 22:10 |
ianw | i think ideally we'd tell the builder to drop out of accepting any new requests, finish what it's doing | 22:10 |
mordred | it would be great if we could just auto-upgrade the nodepool components and not require manual restarts ... so maybe we need to add a signal to the playbook | 22:11 |
corvus | ianw: i think that a docker container stop begins with a sigterm? | 22:11 |
corvus | https://docs.docker.com/engine/reference/commandline/stop/ | 22:12 |
corvus | term, grace period, kill | 22:12 |
corvus | so maybe if we can tell docker-compose to use a really long grace period, it will Just Work? | 22:12 |
ianw | hrm, i guess that would hold up the whole ansible pulse though? | 22:13 |
mordred | it shouldn't - docker-compose up -d should return pretty immediately | 22:13 |
corvus | the down could hold it up, but is that a problem? | 22:13 |
clarkb | you can set --timeout on docker-compose up | 22:13 |
corvus | mordred: oh | 22:13 |
ianw | mordred: hrm, then i wonder what happens if the next pulse happens while the last one is still shutting down :) | 22:14 |
corvus | mordred: you're saying the docker-compose up will return immediately, and docker will async stop and start in the background | 22:14 |
mordred | I *think* so | 22:14 |
mordred | but then ianw has a good question | 22:14 |
corvus | sounds reasonable | 22:14 |
corvus | it doesn't take that long to stop dib, does it? | 22:14 |
mordred | will a second up correctly no-op | 22:14 |
mordred | corvus: does builder propogate the TERM? | 22:14 |
corvus | i think that's still how we shut down a builder? | 22:15 |
ianw | corvus: no, but i think ideally, if there was a way, you'd stop accepting new requests, finish what you were doing, and then exit | 22:15 |
corvus | (like, we systemctl stop nodepool-builder, which would then term) | 22:15 |
corvus | ianw: i feel like with builders it should be okay to just stop asap | 22:16 |
ianw | yeah, we usually combine that with a reboot and clearing out a bunch of crap, in practice | 22:16 |
clarkb | you could end up starving builds if nodepool images were built often | 22:16 |
clarkb | (I don't think that is a really big issue though)( | 22:16 |
corvus | problems i'd like to have: 47) nodepool merging changes fast enough to worry about build starvation :) | 22:17 |
ianw | i will say that anywhere dib exits and doesn't clear up *is* a bug ... but practically, it depends on where you exit as to how well mounts etc are cleaned up | 22:17 |
*** tosky has quit IRC | 22:20 | |
ianw | i guess we just see how it goes, and worry about it if it's leaking itself to death | 22:21 |
clarkb | semi related, ianw did you see the request for a dib release over the weekend? | 22:23 |
mordred | ianw: yeah - I think maybe if it leaks itself to death - we should just figure out how to fix it ;) | 22:23 |
openstackgerrit | Kendall Nelson proposed opendev/irc-meetings master: Update FC SIG Meeting https://review.opendev.org/716098 | 22:23 |
corvus | mordred: i have an opendev zuul-registry <-> sdk <-> swift question; where should i ask you about that? :) | 22:24 |
mordred | corvus: so many options! | 22:25 |
mordred | corvus: here or -sdks probably | 22:25 |
mordred | let's start here - we can go there if it turns out to be an SDK bug | 22:26 |
corvus | mordred: k, lemme get a paste going | 22:26 |
mordred | sweet | 22:26 |
* mordred is excited | 22:26 | |
corvus | this is hard; we need an event id for these lines. it'll take me a minute. | 22:29 |
mordred | oh GOOD | 22:29 |
mordred | a hard one | 22:29 |
ianw | clarkb: one was done yesterday with the python stow stuff | 22:29 |
clarkb | ianw: perfect! :) | 22:29 |
mordred | noonedeadpunk: ^^ | 22:30 |
ianw | clarkb: and fedora 31, which i was going to add to the builder soon, which was making me think about how to restart it :) | 22:30 |
ianw | although, i'm still not 100% sure we have the old semantics that dib releases update in the container | 22:30 |
ianw | $ disk-image-create --version | 22:32 |
ianw | 2.34.1 | 22:32 |
ianw | $ nodepool --version | 22:32 |
ianw | Nodepool version: 3.12.1.dev30 | 22:32 |
clarkb | ianw: this is probably similar to jeepyb and gerrit images. You want both to trigger an update to the artifact | 22:34 |
clarkb | ianw: currently I doubt that nodepool iamges are rebuilt when dib updates | 22:34 |
mordred | clarkb: it would be very hard to do so since nodepool is in the zuul tenant | 22:34 |
clarkb | maybe this is the reason to start periodic image updates? | 22:35 |
mordred | maybe. or make something that runs in promote in dib and sends a null-change to nodepool that we can insta-merge | 22:35 |
mordred | (or in release in dib rather) | 22:35 |
mordred | (there is a very worthwhile general concept here that would be good to wrap our heads around) | 22:36 |
corvus | if nodepool requires a specific version of dib, bump the requirement in nodepool? | 22:36 |
clarkb | corvus: oh thats a good way of expressing it, though I'm not sure we really need a hard dep as much as "using latest is nice" | 22:36 |
mordred | corvus: I don't think nodepool does need it - but I think we'd like the latest dib in prod | 22:36 |
corvus | (like, if we don't care about dib releases, then we don't care about making new nodepool images when dib updates. if we do care about dib releases, then we should have a version spec.) | 22:37 |
mordred | yeah. what clarkb said | 22:37 |
* mordred is fine bumping the min over in nodepool - just saying we don't _strictly_ *require* the most recent in nodepool ... but maybe in this case that semantic difference isn't necessary | 22:37 | |
corvus | i'm not particularly concerned about running the latest in prod unless there's a bugfix or new feature :| | 22:37 |
ianw | yeah, i think it's if the user wants the latest features of dib, in terms of the interface between nodepool<->dib there's no hard dependency | 22:37 |
mordred | corvus: in this case it's a new feature | 22:37 |
mordred | but yeah -it's a zuul-jobs user that wants a new dib element | 22:38 |
ianw | not really, i want the new version for fedora 31 support | 22:38 |
mordred | oh - even more important | 22:38 |
ianw | which is an infra image | 22:38 |
mordred | ianw: so maybe that is, in fact, a good reason to require the new dib in nodepool | 22:38 |
mordred | since anybody who wants to use nodepool to build f31 images on ubuntu is going to need it | 22:38 |
ianw | i don't mind turning it into >= type requirement situation in nodepool, if that's what we agree | 22:39 |
mordred | for this case it seems valid | 22:39 |
ianw | mordred: to propose a null change, we'd have to have something similar to the proposal bot? encode a secret for a user, pull a nodepool tree, and push a change? or is there some other way? | 22:41 |
ianw | to automagically propose changes via the release pipeline, i mean | 22:41 |
mordred | yeah - it would be something like that | 22:41 |
corvus | i'd really like it if we could treat dib like a normal dependency | 22:43 |
mordred | ++ | 22:43 |
clarkb | ya that is why I wondered if periodic builds would be worthwhile | 22:43 |
clarkb | then each day (or whatever period) we'd get an image with the newest versions of all deps | 22:44 |
corvus | we build a nodepool image every few days | 22:44 |
corvus | at most | 22:44 |
corvus | it seems that right now we REALLY CARE about what dib version it is, so let's bump it | 22:44 |
openstackgerrit | Amy Marrich (spotz) proposed opendev/irc-meetings master: Update FC SIG Meeting https://review.opendev.org/716098 | 22:44 |
corvus | then go back to not caring for the next 6 months :) | 22:44 |
corvus | https://zuul.opendev.org/t/zuul/builds?job_name=nodepool-build-image | 22:46 |
ianw | yeah, just given what dib practically does, it's unlikely there will be many releases that infra production doesn't care about -- to say another way we're not doing a lot of outside development | 22:46 |
ianw | so traditionally the model that puppet pulls in the latest as it releases has worked well | 22:46 |
ianw | we don't *have* to automate that, just that's the way it's been | 22:46 |
corvus | then maybe we should look at splitting the elements out from dib | 22:48 |
corvus | so they can be updated faster and externally from the software dependency | 22:48 |
clarkb | wouldn't that have a similar issue? | 22:48 |
corvus | bind mount them over the installed image or something | 22:49 |
clarkb | I guess if we did't install them into thei mage we could keep an external set up to date that were mounted in | 22:49 |
mordred | yeah | 22:49 |
mordred | I mean - honestly, the number of times DIB changes these days is almost never | 22:49 |
mordred | the dib element library is where most of the action takes place | 22:50 |
clarkb | what if we ignore dib for a minute. And think about this from the perspective of needing the base image to be updated for a security update | 22:50 |
corvus | then we'd just update it | 22:50 |
clarkb | (I think there is a general class of problem here and Iwonder if we are fixated on dib too much) | 22:50 |
clarkb | corvus: the base image itself would be called the same thing it would just need a rebuild | 22:50 |
clarkb | corvus: how do we express that? a noop change? | 22:51 |
corvus | i'd really like to just find out if the nodepool image only updating 3 times a week is really untenable for us | 22:51 |
corvus | i don't think it is | 22:51 |
ianw | proposed : Ian Wienand proposed zuul/nodepool master: Update dib dep to 2.35.0 https://review.opendev.org/716104 | 22:51 |
corvus | clarkb: sure, if it's important, go for it | 22:51 |
corvus | it's just that in the normal course of events, this happens every few days automatically | 22:52 |
clarkb | corvus: ok if we are on board with that I think it works for that general case of problem | 22:52 |
mordred | I agree - a noop change in such a situation would work fine | 22:52 |
ianw | proposed : Ian Wienand proposed zuul/nodepool master: Update dib dep to 2.35.0 https://review.opendev.org/716104 | 22:52 |
ianw | although, another option would be to manually do that in the Dockerfile | 22:52 |
mordred | manually do what? | 22:53 |
ianw | since nodepool doesn't *need* dib 2.35.0 ... infra production, or maybe we could say users of the container image do? | 22:53 |
mordred | I don;'t have any problem just bumping the min - I don't know that there's any compelling use case for avoiding the bump | 22:53 |
mordred | anybody installing via pip is getting 2.35 right now anyway :) | 22:54 |
ianw | yes true, only if they'd gone to some effort to pin backwards for some reason | 22:54 |
corvus | if it's worth the 3 of us talking about it for 30 minutes, it's worth a version bump | 22:54 |
mordred | yeah. and if that person exists, maybe this will flush them out and they'll tell us about why | 22:54 |
ianw | corvus: heh, well yeah, but brown-bag fixes patched up in dib are frequent enough that i'm glad we can figure out procedure now rather than when builds are dead everywhere :) | 22:56 |
corvus | maybe dib changes more than i think it does? | 22:56 |
ianw | i mean these days it's usally new distro release, but occasionally something async happens like a new point release of centos etc that breaks and requires fixes | 22:58 |
fungi | we often go a month or two of quiet and then there's a few weeks where some platform is broken and needs a series of fixes in dib and one or more new releases depending on how thoroughly it gets fixed on the first try | 22:59 |
mordred | yeah - so - I just looked | 23:00 |
fungi | but it's basically all elements | 23:00 |
mordred | over the past 6 months | 23:00 |
mordred | we've had 3 incidents of relevant change | 23:00 |
mordred | one sept 27, one feb 13 then 2 mar 18 and one mar 27 | 23:01 |
mordred | so by and large, while I agree there is a theoretical conceptual issue that could be solved | 23:01 |
mordred | in practice, this is not an issue for us | 23:01 |
mordred | sept 27 was adding centos-8 support - feb 13 was fixing a venv/glean thing - then we had a cluster of build-only-packages, python-stow and fedora-31 | 23:03 |
mordred | each of those, to me, is worthy of a nodepool min bump actually | 23:03 |
mordred | since they're all things one would want to make sure are in a sane nodepool-buiolder | 23:03 |
clarkb | ya I think bumps in those cases make sense | 23:03 |
ianw | basically it would be the exception that a dib release did *not* result in a production bump, rather than the rule, i would say even after looking over the recent releases | 23:11 |
ianw | i'm totally fine with manually bumping nodepool, since it seems we're agreed the nodepool requirements.txt isn't so much "the minimum required to actually do something" but more tied to what people want to build at that point in time | 23:11 |
corvus | i'm not sure i agree with that | 23:11 |
corvus | you're telling me we can't build f31 without the newest version. i think that warrants a bump. | 23:12 |
corvus | i don't think i would agree with "we should just encode whatever the current version is in requirements" | 23:12 |
corvus | (i don't think we should bump the version to get the stow element) | 23:13 |
ianw | well, i'm saying that practically, and you can read back through the changes in dib release history and maybe you'll agree, there are not many dib releases that would not have some affect on a production infra image somehow | 23:13 |
corvus | yeah, that may be true; i'm not sure what dib's release criteria are | 23:14 |
corvus | i don't know if stow got its own release or not | 23:14 |
ianw | well yes, i mean chicken-egg -- the release criteria is usually "we need this in some sort of production" and that production is *usually* opendev infra | 23:15 |
ianw | anyway, if people want to make the procedure concrete with a vote on https://review.opendev.org/#/c/716104/ then that would be great, and we can have 2.35.0 in production and i can try deploying f31 | 23:16 |
fungi | i do expect that most dib releases in modern history have been because opendev asked for a release so we could consume some fix | 23:16 |
corvus | mordred: the issue i'm seeing looks like an object in swift is 4 bytes shorter than i expect | 23:20 |
corvus | mordred: http://paste.openstack.org/show/791362/ | 23:20 |
corvus | mordred: the key line is #16: INFO registry.api: Finish Upload chunk zuul/zuul-executor 5d35161962bd40bebeb022fcc41686ae 28567940 | 23:20 |
corvus | mordred: last number is the length of the first chunk of the upload; that's really the only chuck, but the weird docker process involves uploading a second zero-byte chunk too (so you'll see a /2) in there | 23:21 |
corvus | mordred: then we do a COPY on both of those chunks | 23:21 |
corvus | mordred: then we upload a multipart-manifest pointing to both of them to combine them | 23:22 |
corvus | then it's done | 23:22 |
corvus | but when i fetch that object, i get 28567936 bytes | 23:23 |
mordred | corvus: yesh | 23:24 |
mordred | corvus: I'm going to need to digest that - and it's evening walk time ... I've got it loaded up in my browser though | 23:24 |
corvus | mordred: yeah, i'm eoding too; hopefully that's a good enough stopping point we can pick it up tomorrow | 23:25 |
corvus | mordred: er, sorry, i think it's a 540672 byte difference: 28567940 vs 28027268 | 23:30 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!