| *** TxGirlGeek has quit IRC | 00:04 | |
| *** TxGirlGeek has joined #starlingx | 00:07 | |
| *** mpeters-wrs has joined #starlingx | 00:27 | |
| *** byang has joined #starlingx | 00:48 | |
| *** TxGirlGeek has quit IRC | 01:28 | |
| *** sgw has quit IRC | 01:50 | |
| *** mpeters-wrs has quit IRC | 02:05 | |
| *** wangyi4 has joined #starlingx | 02:06 | |
| *** sgw has joined #starlingx | 02:54 | |
| *** mpeters-wrs has joined #starlingx | 03:09 | |
| *** mpeters-wrs has quit IRC | 03:43 | |
| *** mpeters-wrs has joined #starlingx | 04:02 | |
| *** wangyi41 has joined #starlingx | 04:16 | |
| *** cyan_ has joined #starlingx | 04:25 | |
| *** rchurch_ has quit IRC | 04:34 | |
| *** rchurch has joined #starlingx | 04:35 | |
| *** mpeters-wrs has quit IRC | 04:36 | |
| *** TxGirlGeek has joined #starlingx | 04:52 | |
| *** mpeters-wrs has joined #starlingx | 05:00 | |
| *** mpeters-wrs has quit IRC | 05:05 | |
| *** TxGirlGeek has quit IRC | 07:05 | |
| *** anran has joined #starlingx | 07:54 | |
| * wangyi41 sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/QyderTfmvjBlanwZaLRCIlPG > | 08:09 | |
| *** wangyi4 has quit IRC | 08:10 | |
| wangyi41 | This task force includes @anran, @yan_chen and me. Welcome more people join in. | 08:13 |
|---|---|---|
| *** anran has quit IRC | 08:55 | |
| *** sgw has quit IRC | 09:45 | |
| *** byang has quit IRC | 12:09 | |
| *** mpeters-wrs has joined #starlingx | 12:14 | |
| *** ijolliffe has quit IRC | 12:52 | |
| *** ijolliffe has joined #starlingx | 13:18 | |
| *** mpeters-wrs has quit IRC | 13:50 | |
| *** sgw has joined #starlingx | 14:07 | |
| *** mpeters-wrs has joined #starlingx | 14:07 | |
| sgw | Morning all | 14:08 |
| *** mpeters-wrs has quit IRC | 14:09 | |
| *** mpeters has joined #starlingx | 14:09 | |
| *** billzvonar has joined #starlingx | 14:41 | |
| sgw | slittle1: Morning | 14:45 |
| ijolliffe | morning - thanks wangyi41 and team - i see 4 reviews posted for the hack-a-thon | 14:46 |
| *** billzvonar has quit IRC | 14:46 | |
| * sgw back in 90 or so | 15:02 | |
| slittle1 | CENGN had troubles setting up the build container via stx-tools/Dockerfile ... | 15:07 |
| slittle1 | RUN pip install python-subunit junitxml --upgrade && \ | 15:08 |
| slittle1 | pip install tox --upgrade | 15:08 |
| slittle1 | failed .... anyone else observing this? | 15:08 |
| dpenney_ | A new version of more-itertools was just released a couple days ago, so maybe we constrain it to the older version for now: https://pypi.org/project/more-itertools/#history | 15:09 |
| dpenney_ | maybe python3-specific code was added and it can't build for python2.7? | 15:11 |
| dpenney_ | looks like the previously successful build was 20200111T023000Z, which used more-itertools 8.0.2 | 15:20 |
| *** TxGirlGeek has joined #starlingx | 16:02 | |
| dpenney_ | I've posted a review to resolve the build failure: https://review.opendev.org/702471 | 16:27 |
| *** abailey has quit IRC | 16:29 | |
| *** abailey has joined #starlingx | 16:30 | |
| slittle1 | looks good | 16:30 |
| *** jrichard has quit IRC | 16:35 | |
| sgw | slittle1: you around? I am working on the layer build testing and having some issues | 16:54 |
| *** mpeters has quit IRC | 16:56 | |
| slittle1 | what are you seeing? | 17:20 |
| *** mpeters-wrs has joined #starlingx | 17:20 | |
| sgw | slittle1: First I tried a basic build of compiler layer, download and built the pkgs OK, then I switched to flock | 17:27 |
| slittle1 | sgw: yes .... | 17:28 |
| sgw | Now, I know I am not perfect so I started with download_mirror, but had forgotten to reset the repo/manifest | 17:28 |
| sgw | There was an issue with the next steps of generate-cgcs-centos-repo and populate-downloads, so there needs to be somekind of error checking that the layers are checked now properly. | 17:29 |
| slittle1 | to be clear .... are you using the same workspace as the former compiler layer build ? | 17:29 |
| slittle1 | I've been using separate workspaces for each layer | 17:30 |
| sgw | Yes, using the same workspace | 17:31 |
| sgw | Also, I think that initial dir to find the centos-mirror-tools/config should be relative to MY_REPO not to the location of the scripts | 17:31 |
| sgw | that way the scripts get copied to /usr/local/bin (in the Dockerfile) and everything else is relative to MY_REPO | 17:32 |
| slittle1 | k | 17:32 |
| slittle1 | stepping out for a few mins to heat lunch | 17:33 |
| sgw | Ok, ping when back | 17:34 |
| slittle1 | sgw: back | 17:41 |
| sgw | that was a fast lunch, did you even have a chance to chew ;-) | 17:41 |
| dpenney_ | the cengn build I kicked off got further, but looks like it fails on downloading the new kata container RPMs | 17:43 |
| dpenney_ | http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/20200114T172515Z/logs/ | 17:43 |
| slittle1 | just heated.... still eating | 17:51 |
| sgw | Got it. | 17:52 |
| sgw | dpenney_: did Bart talk with you about the PBR stuff? The PRC folks are still doing the performance check | 17:53 |
| bwensley | I did talk with him. | 17:54 |
| bwensley | He seems OK with it, but I'll let him answer as well. :) | 17:54 |
| slittle1 | meeting | 18:01 |
| *** jrichard has joined #starlingx | 18:22 | |
| *** mpeters-wrs has quit IRC | 18:52 | |
| *** mpeters-wrs has joined #starlingx | 18:54 | |
| *** mpeters-wrs has quit IRC | 18:58 | |
| *** mpeters-wrs has joined #starlingx | 18:58 | |
| dpenney_ | +1 :) | 19:21 |
| slittle1 | I think we'll need to revert stx-tools:431885231ae41256188a7c32f0f5351c4455707b to fix the CENGN build. | 19:25 |
| dpenney_ | that would require reverting maybe 8 commits, vs updating the versions of the rpms in question | 19:26 |
| slittle1 | Looks like the kata repo was updated Dec_10, and the update that just merged has never been updated to track the upstream change | 19:26 |
| dpenney_ | they're all binary rpms, so we should be able to just update the LST file with the new versions, which I'm looking at now | 19:27 |
| slittle1 | ok, I'll buy that | 19:27 |
| dpenney_ | doing a test download now | 19:30 |
| sgw | dpenney_: that +1 was to the PBR stuff? | 19:32 |
| sgw | slittle1: I guess we assumed the kata folks had the right versions, clearly they were working with cached RPMS | 19:32 |
| dpenney_ | yeah, I'm not concerned over the versioning impacting the update mechanism, as the versions would always be incrementing (or PBR and semver would be fundamentally broken) | 19:33 |
| dpenney_ | I'll post a review shortly for the kata fix | 19:33 |
| sgw | slittle1: so your working assumtion for the layering is that each layer is a different workspace? Are you testing mirror download and generation based on that or are you using the corporate mirror for your default /import location? in other words your mirror is always fully populated | 19:37 |
| slittle1 | Trying to test both | 19:38 |
| dpenney_ | review is posted: https://review.opendev.org/702506 | 19:38 |
| sgw | I can start testing with the assumption 1 workspace/layer, but will start with empty mirror. (I actually pass the "output" directory for my mirror to avoid an extra copy) | 19:38 |
| dpenney_ | once https://review.opendev.org/702506 is merged, I'll kick off a new CENGN build again | 19:42 |
| sgw | dpenney_: so all you tested was that they exist and download, do we know if the functionality will change? | 20:00 |
| dpenney_ | yeah, all I verified was that they could be downloaded to allow the build to proceed. Otherwise, we can revert all the kata updates and have them rebase | 20:01 |
| sgw | dpenney_: back the nvme follow-up, so if I use the command you suggested, will that also address the storage config issues? I am not local to the machines right now, if I lock, delete the existing storage nodes, then I can't reboot them, is there a suggested process? | 20:02 |
| dpenney_ | I would expect storage config should be fine. Configure the rootfs/boot device, the node should install and discover resources, and populate the system database appropriately | 20:06 |
| dpenney_ | and I think deleting the node triggers a wipedisk and reboot, but I could be mistaken | 20:07 |
| sgw | dpenney_: system host-lock 4 | 20:16 |
| sgw | Cannot lock a storage node when ceph pools are not empty and replication is lost. This may result in data loss. | 20:16 |
| sgw | Is there a way to force the lock | 20:16 |
| dpenney_ | rchurch, do you know the answer to sgw's question? | 20:20 |
| rchurch | You should be able to force lock the storage host. If you don't care about the data in the cluster you can delete the pools. That will restore HEALTH_OK and you can then lock normally | 20:23 |
| dpenney_ | per Frank's email reply, I'll abandon my kata update and we can trigger a revert of the 8 kata updates | 20:24 |
| sgw | rchurch: force lock? I dont see that in the help output | 20:26 |
| rchurch | swg: system host-lock [-f] <hostname or id> | 20:28 |
| sgw | rchurch: where is that actually documented, it's not part of the system host-lock --help option | 20:31 |
| dpenney_ | system help host-lock | 20:32 |
| rchurch | Yep. That's what I did | 20:32 |
| sgw | Ah, you guys know your tools, may tools use --help of the sub option (think git, although it works both ways) | 20:34 |
| sgw | dpenney_ rchurch: so when I set up for personality=storage, do I have to define the storage disk also as nvme vs sd? | 20:35 |
| sgw | the command that dpenney_ sent in email was just for rootfs and boot device | 20:36 |
| dpenney_ | if your disks are nvme, yes | 20:36 |
| dpenney_ | the disk configuration would be a post install step | 20:36 |
| dpenney_ | for storage OSDs | 20:36 |
| dpenney_ | the disks would get discovered and put in sysinv | 20:36 |
| sgw | dpenney_: that's defined on in the existing documentation when doing a standard install, I will try that again, I did change those OSD commands | 20:37 |
| *** rchurch has quit IRC | 20:39 | |
| dpenney_ | Kata update reversions: https://review.opendev.org/#/q/status:open+projects:starlingx+branch:master+topic:kata | 20:40 |
| *** rchurch has joined #starlingx | 20:40 | |
| bwensley | Still need a core from tools and root. | 20:46 |
| dpenney_ | sgw: can you review the two reversions for tools and root? | 20:46 |
| *** mpeters-wrs has quit IRC | 20:47 | |
| *** mpeters-wrs has joined #starlingx | 20:48 | |
| sgw | dpenney_: sorry stepped way to zap my lunch | 20:49 |
| *** mpeters-wrs has quit IRC | 20:49 | |
| sgw | looking | 20:49 |
| *** mpeters-wrs has joined #starlingx | 20:49 | |
| sgw | dpenney_: your missing signed-off-by, but I will let it go this time | 20:50 |
| dpenney_ | I just used the "revert" button in gerrit :) | 20:52 |
| dpenney_ | didn't even notice the lack of signed-off-by ;) | 20:53 |
| sgw | no worries | 20:55 |
| sgw | guys would there be any reason that host-disk-list would not list a second drive (short of it not existing)? The NUC I ordered were supposed to all be the same with 2 disks, is there a way to unlock ssh to get into a compute or storage node via ssh? | 21:37 |
| dpenney_ | is it in a RAID config, maybe? | 21:43 |
| dpenney_ | once it installs, you should be able to ssh in as sysadmin, using the original password - which will prompt for an immediate change | 21:43 |
| *** mpeters-wrs has quit IRC | 21:46 | |
| abailey | hackathon review for test_pvs uploaded: https://review.opendev.org/#/c/702537/ | 21:49 |
| rchurch | The sysinv agent might not recognize the disk if the disk's major device number is not in the supported list. Check out VALID_MAJOR_LIST in sysinv/common/constants.py | 21:50 |
| abailey | which failed zuul :( | 21:50 |
| dpenney_ | kata reversions have merged and I've kicked a new cengn build | 21:51 |
| sgw | dpenney_: Ah, I thought all nodes got the new admin password set up for the controllers, that worked and apparently I got shorted on disks! I need to double check hardware when I am back to where they are | 21:53 |
| sgw | dpenney_: 3 have 2 disks and 3 apparently only have 1 :-( | 21:54 |
| dpenney_ | they do get the new password, but only after the first unlocking, when the puppet apply happens | 21:54 |
| sgw | Ah, and those are locked because they are not fully provisioned yet, so will that screw with unlocking? | 21:55 |
| dpenney_ | well that's unfortunate... I've seen cases where RAID config made two disks look like a single disk, I think | 21:55 |
| dpenney_ | nope, you changing the password won't cause a problem | 21:55 |
| sgw | no I don't think it RAID the disks, like I said will double check | 21:55 |
| * sgw BTW, to all, thanks for being here on IRC and helping out, this was way faster for me to diagnose and understand my issue | 21:56 | |
| sgw | abailey: pylint gotcha! | 21:57 |
| sgw | dpenney_: just to confirm, if I try to unlock only one storage, it will fail, as it needs to talk to storage-1 before it will configure and provision properly, correct? | 21:59 |
| rchurch | The unlock should succeed. The cluster will not be healthy until the other storage host is unlocked and any required data replication is done. | 22:01 |
| sgw | Yes, it unlocked and is showing operational/disabled and availability/failed | 22:05 |
| sgw | fm alarm-list shows that storage-0 experienced a configuration failure | 22:07 |
| sgw | and a "service affecting failre" | 22:08 |
| sgw | failure | 22:08 |
| rchurch | Lock the storage host. Check the puppet logs on the storage host: sudo less -R /var/log/puppet/latest/puppet.log | 22:10 |
| *** ijolliffe has quit IRC | 22:22 | |
| sgw | More stangeness ensues, it decided to reboot itself as I was looking at the logs, it looks like it had /dev/disk/by-path/ nodes for a second nvme , but no /dev/nvme1n1 entry | 22:29 |
| sgw | This might have to wait until Friday when I am back next to the hardware | 22:29 |
| rchurch | FWIW, I've provisioned storage hosts with NVMe only disks in the past, so I wouldn't expect a problem. Doesn't mean there isn't one, but it worked at one time. | 22:33 |
| *** abailey has quit IRC | 22:37 | |
| sgw | rchurch: thanks for that, helpful to know | 22:41 |
| *** jrichard has quit IRC | 22:58 | |
| *** jrichard has joined #starlingx | 23:08 | |
| jrichard | hackathon: updated review for network api tests is up. https://review.opendev.org/702300 | 23:28 |
| *** mpeters-wrs has joined #starlingx | 23:31 | |
| *** mpeters-wrs has quit IRC | 23:48 | |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!