*** TxGirlGeek has quit IRC | 00:04 | |
*** TxGirlGeek has joined #starlingx | 00:07 | |
*** mpeters-wrs has joined #starlingx | 00:27 | |
*** byang has joined #starlingx | 00:48 | |
*** TxGirlGeek has quit IRC | 01:28 | |
*** sgw has quit IRC | 01:50 | |
*** mpeters-wrs has quit IRC | 02:05 | |
*** wangyi4 has joined #starlingx | 02:06 | |
*** sgw has joined #starlingx | 02:54 | |
*** mpeters-wrs has joined #starlingx | 03:09 | |
*** mpeters-wrs has quit IRC | 03:43 | |
*** mpeters-wrs has joined #starlingx | 04:02 | |
*** wangyi41 has joined #starlingx | 04:16 | |
*** cyan_ has joined #starlingx | 04:25 | |
*** rchurch_ has quit IRC | 04:34 | |
*** rchurch has joined #starlingx | 04:35 | |
*** mpeters-wrs has quit IRC | 04:36 | |
*** TxGirlGeek has joined #starlingx | 04:52 | |
*** mpeters-wrs has joined #starlingx | 05:00 | |
*** mpeters-wrs has quit IRC | 05:05 | |
*** TxGirlGeek has quit IRC | 07:05 | |
*** anran has joined #starlingx | 07:54 | |
* wangyi41 sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/QyderTfmvjBlanwZaLRCIlPG > | 08:09 | |
*** wangyi4 has quit IRC | 08:10 | |
wangyi41 | This task force includes @anran, @yan_chen and me. Welcome more people join in. | 08:13 |
---|---|---|
*** anran has quit IRC | 08:55 | |
*** sgw has quit IRC | 09:45 | |
*** byang has quit IRC | 12:09 | |
*** mpeters-wrs has joined #starlingx | 12:14 | |
*** ijolliffe has quit IRC | 12:52 | |
*** ijolliffe has joined #starlingx | 13:18 | |
*** mpeters-wrs has quit IRC | 13:50 | |
*** sgw has joined #starlingx | 14:07 | |
*** mpeters-wrs has joined #starlingx | 14:07 | |
sgw | Morning all | 14:08 |
*** mpeters-wrs has quit IRC | 14:09 | |
*** mpeters has joined #starlingx | 14:09 | |
*** billzvonar has joined #starlingx | 14:41 | |
sgw | slittle1: Morning | 14:45 |
ijolliffe | morning - thanks wangyi41 and team - i see 4 reviews posted for the hack-a-thon | 14:46 |
*** billzvonar has quit IRC | 14:46 | |
* sgw back in 90 or so | 15:02 | |
slittle1 | CENGN had troubles setting up the build container via stx-tools/Dockerfile ... | 15:07 |
slittle1 | RUN pip install python-subunit junitxml --upgrade && \ | 15:08 |
slittle1 | pip install tox --upgrade | 15:08 |
slittle1 | failed .... anyone else observing this? | 15:08 |
dpenney_ | A new version of more-itertools was just released a couple days ago, so maybe we constrain it to the older version for now: https://pypi.org/project/more-itertools/#history | 15:09 |
dpenney_ | maybe python3-specific code was added and it can't build for python2.7? | 15:11 |
dpenney_ | looks like the previously successful build was 20200111T023000Z, which used more-itertools 8.0.2 | 15:20 |
*** TxGirlGeek has joined #starlingx | 16:02 | |
dpenney_ | I've posted a review to resolve the build failure: https://review.opendev.org/702471 | 16:27 |
*** abailey has quit IRC | 16:29 | |
*** abailey has joined #starlingx | 16:30 | |
slittle1 | looks good | 16:30 |
*** jrichard has quit IRC | 16:35 | |
sgw | slittle1: you around? I am working on the layer build testing and having some issues | 16:54 |
*** mpeters has quit IRC | 16:56 | |
slittle1 | what are you seeing? | 17:20 |
*** mpeters-wrs has joined #starlingx | 17:20 | |
sgw | slittle1: First I tried a basic build of compiler layer, download and built the pkgs OK, then I switched to flock | 17:27 |
slittle1 | sgw: yes .... | 17:28 |
sgw | Now, I know I am not perfect so I started with download_mirror, but had forgotten to reset the repo/manifest | 17:28 |
sgw | There was an issue with the next steps of generate-cgcs-centos-repo and populate-downloads, so there needs to be somekind of error checking that the layers are checked now properly. | 17:29 |
slittle1 | to be clear .... are you using the same workspace as the former compiler layer build ? | 17:29 |
slittle1 | I've been using separate workspaces for each layer | 17:30 |
sgw | Yes, using the same workspace | 17:31 |
sgw | Also, I think that initial dir to find the centos-mirror-tools/config should be relative to MY_REPO not to the location of the scripts | 17:31 |
sgw | that way the scripts get copied to /usr/local/bin (in the Dockerfile) and everything else is relative to MY_REPO | 17:32 |
slittle1 | k | 17:32 |
slittle1 | stepping out for a few mins to heat lunch | 17:33 |
sgw | Ok, ping when back | 17:34 |
slittle1 | sgw: back | 17:41 |
sgw | that was a fast lunch, did you even have a chance to chew ;-) | 17:41 |
dpenney_ | the cengn build I kicked off got further, but looks like it fails on downloading the new kata container RPMs | 17:43 |
dpenney_ | http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/20200114T172515Z/logs/ | 17:43 |
slittle1 | just heated.... still eating | 17:51 |
sgw | Got it. | 17:52 |
sgw | dpenney_: did Bart talk with you about the PBR stuff? The PRC folks are still doing the performance check | 17:53 |
bwensley | I did talk with him. | 17:54 |
bwensley | He seems OK with it, but I'll let him answer as well. :) | 17:54 |
slittle1 | meeting | 18:01 |
*** jrichard has joined #starlingx | 18:22 | |
*** mpeters-wrs has quit IRC | 18:52 | |
*** mpeters-wrs has joined #starlingx | 18:54 | |
*** mpeters-wrs has quit IRC | 18:58 | |
*** mpeters-wrs has joined #starlingx | 18:58 | |
dpenney_ | +1 :) | 19:21 |
slittle1 | I think we'll need to revert stx-tools:431885231ae41256188a7c32f0f5351c4455707b to fix the CENGN build. | 19:25 |
dpenney_ | that would require reverting maybe 8 commits, vs updating the versions of the rpms in question | 19:26 |
slittle1 | Looks like the kata repo was updated Dec_10, and the update that just merged has never been updated to track the upstream change | 19:26 |
dpenney_ | they're all binary rpms, so we should be able to just update the LST file with the new versions, which I'm looking at now | 19:27 |
slittle1 | ok, I'll buy that | 19:27 |
dpenney_ | doing a test download now | 19:30 |
sgw | dpenney_: that +1 was to the PBR stuff? | 19:32 |
sgw | slittle1: I guess we assumed the kata folks had the right versions, clearly they were working with cached RPMS | 19:32 |
dpenney_ | yeah, I'm not concerned over the versioning impacting the update mechanism, as the versions would always be incrementing (or PBR and semver would be fundamentally broken) | 19:33 |
dpenney_ | I'll post a review shortly for the kata fix | 19:33 |
sgw | slittle1: so your working assumtion for the layering is that each layer is a different workspace? Are you testing mirror download and generation based on that or are you using the corporate mirror for your default /import location? in other words your mirror is always fully populated | 19:37 |
slittle1 | Trying to test both | 19:38 |
dpenney_ | review is posted: https://review.opendev.org/702506 | 19:38 |
sgw | I can start testing with the assumption 1 workspace/layer, but will start with empty mirror. (I actually pass the "output" directory for my mirror to avoid an extra copy) | 19:38 |
dpenney_ | once https://review.opendev.org/702506 is merged, I'll kick off a new CENGN build again | 19:42 |
sgw | dpenney_: so all you tested was that they exist and download, do we know if the functionality will change? | 20:00 |
dpenney_ | yeah, all I verified was that they could be downloaded to allow the build to proceed. Otherwise, we can revert all the kata updates and have them rebase | 20:01 |
sgw | dpenney_: back the nvme follow-up, so if I use the command you suggested, will that also address the storage config issues? I am not local to the machines right now, if I lock, delete the existing storage nodes, then I can't reboot them, is there a suggested process? | 20:02 |
dpenney_ | I would expect storage config should be fine. Configure the rootfs/boot device, the node should install and discover resources, and populate the system database appropriately | 20:06 |
dpenney_ | and I think deleting the node triggers a wipedisk and reboot, but I could be mistaken | 20:07 |
sgw | dpenney_: system host-lock 4 | 20:16 |
sgw | Cannot lock a storage node when ceph pools are not empty and replication is lost. This may result in data loss. | 20:16 |
sgw | Is there a way to force the lock | 20:16 |
dpenney_ | rchurch, do you know the answer to sgw's question? | 20:20 |
rchurch | You should be able to force lock the storage host. If you don't care about the data in the cluster you can delete the pools. That will restore HEALTH_OK and you can then lock normally | 20:23 |
dpenney_ | per Frank's email reply, I'll abandon my kata update and we can trigger a revert of the 8 kata updates | 20:24 |
sgw | rchurch: force lock? I dont see that in the help output | 20:26 |
rchurch | swg: system host-lock [-f] <hostname or id> | 20:28 |
sgw | rchurch: where is that actually documented, it's not part of the system host-lock --help option | 20:31 |
dpenney_ | system help host-lock | 20:32 |
rchurch | Yep. That's what I did | 20:32 |
sgw | Ah, you guys know your tools, may tools use --help of the sub option (think git, although it works both ways) | 20:34 |
sgw | dpenney_ rchurch: so when I set up for personality=storage, do I have to define the storage disk also as nvme vs sd? | 20:35 |
sgw | the command that dpenney_ sent in email was just for rootfs and boot device | 20:36 |
dpenney_ | if your disks are nvme, yes | 20:36 |
dpenney_ | the disk configuration would be a post install step | 20:36 |
dpenney_ | for storage OSDs | 20:36 |
dpenney_ | the disks would get discovered and put in sysinv | 20:36 |
sgw | dpenney_: that's defined on in the existing documentation when doing a standard install, I will try that again, I did change those OSD commands | 20:37 |
*** rchurch has quit IRC | 20:39 | |
dpenney_ | Kata update reversions: https://review.opendev.org/#/q/status:open+projects:starlingx+branch:master+topic:kata | 20:40 |
*** rchurch has joined #starlingx | 20:40 | |
bwensley | Still need a core from tools and root. | 20:46 |
dpenney_ | sgw: can you review the two reversions for tools and root? | 20:46 |
*** mpeters-wrs has quit IRC | 20:47 | |
*** mpeters-wrs has joined #starlingx | 20:48 | |
sgw | dpenney_: sorry stepped way to zap my lunch | 20:49 |
*** mpeters-wrs has quit IRC | 20:49 | |
sgw | looking | 20:49 |
*** mpeters-wrs has joined #starlingx | 20:49 | |
sgw | dpenney_: your missing signed-off-by, but I will let it go this time | 20:50 |
dpenney_ | I just used the "revert" button in gerrit :) | 20:52 |
dpenney_ | didn't even notice the lack of signed-off-by ;) | 20:53 |
sgw | no worries | 20:55 |
sgw | guys would there be any reason that host-disk-list would not list a second drive (short of it not existing)? The NUC I ordered were supposed to all be the same with 2 disks, is there a way to unlock ssh to get into a compute or storage node via ssh? | 21:37 |
dpenney_ | is it in a RAID config, maybe? | 21:43 |
dpenney_ | once it installs, you should be able to ssh in as sysadmin, using the original password - which will prompt for an immediate change | 21:43 |
*** mpeters-wrs has quit IRC | 21:46 | |
abailey | hackathon review for test_pvs uploaded: https://review.opendev.org/#/c/702537/ | 21:49 |
rchurch | The sysinv agent might not recognize the disk if the disk's major device number is not in the supported list. Check out VALID_MAJOR_LIST in sysinv/common/constants.py | 21:50 |
abailey | which failed zuul :( | 21:50 |
dpenney_ | kata reversions have merged and I've kicked a new cengn build | 21:51 |
sgw | dpenney_: Ah, I thought all nodes got the new admin password set up for the controllers, that worked and apparently I got shorted on disks! I need to double check hardware when I am back to where they are | 21:53 |
sgw | dpenney_: 3 have 2 disks and 3 apparently only have 1 :-( | 21:54 |
dpenney_ | they do get the new password, but only after the first unlocking, when the puppet apply happens | 21:54 |
sgw | Ah, and those are locked because they are not fully provisioned yet, so will that screw with unlocking? | 21:55 |
dpenney_ | well that's unfortunate... I've seen cases where RAID config made two disks look like a single disk, I think | 21:55 |
dpenney_ | nope, you changing the password won't cause a problem | 21:55 |
sgw | no I don't think it RAID the disks, like I said will double check | 21:55 |
* sgw BTW, to all, thanks for being here on IRC and helping out, this was way faster for me to diagnose and understand my issue | 21:56 | |
sgw | abailey: pylint gotcha! | 21:57 |
sgw | dpenney_: just to confirm, if I try to unlock only one storage, it will fail, as it needs to talk to storage-1 before it will configure and provision properly, correct? | 21:59 |
rchurch | The unlock should succeed. The cluster will not be healthy until the other storage host is unlocked and any required data replication is done. | 22:01 |
sgw | Yes, it unlocked and is showing operational/disabled and availability/failed | 22:05 |
sgw | fm alarm-list shows that storage-0 experienced a configuration failure | 22:07 |
sgw | and a "service affecting failre" | 22:08 |
sgw | failure | 22:08 |
rchurch | Lock the storage host. Check the puppet logs on the storage host: sudo less -R /var/log/puppet/latest/puppet.log | 22:10 |
*** ijolliffe has quit IRC | 22:22 | |
sgw | More stangeness ensues, it decided to reboot itself as I was looking at the logs, it looks like it had /dev/disk/by-path/ nodes for a second nvme , but no /dev/nvme1n1 entry | 22:29 |
sgw | This might have to wait until Friday when I am back next to the hardware | 22:29 |
rchurch | FWIW, I've provisioned storage hosts with NVMe only disks in the past, so I wouldn't expect a problem. Doesn't mean there isn't one, but it worked at one time. | 22:33 |
*** abailey has quit IRC | 22:37 | |
sgw | rchurch: thanks for that, helpful to know | 22:41 |
*** jrichard has quit IRC | 22:58 | |
*** jrichard has joined #starlingx | 23:08 | |
jrichard | hackathon: updated review for network api tests is up. https://review.opendev.org/702300 | 23:28 |
*** mpeters-wrs has joined #starlingx | 23:31 | |
*** mpeters-wrs has quit IRC | 23:48 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!