Setting Up Proxmox High Availability Cluster & Ceph

preview_player
Показать описание
Setting up Proxmox HA Cluster and Ceph Storage from scratch.

○○○ LINKS ○○○

○○○ SHOP ○○○

○○○ TIMECODE ○○○
0:00 intro
0:16 High Availability & Ceph
1:13 My Proxmox Setup
1:45 Setting Up Proxmox Cluster
3:59 Installing Ceph
6:14 Setup Ceph OSD
6:55 Ceph Monitors Setup
7:25 Ceph Pool
7:54 HA Group Settings
9:09 setting up container
10:49 Testing Migration
12:51 Testing Failover
16:11 Conclusion
○○○ SUPPORT ○○○

○○○ SOCIAL ○○○

○○○ Send Me Stuff ○○○
Don Hui
PO BOX 765
Farmingville, NY 11738

○○○ Music ○○○
From Epidemic Sounds

DISCLAIMER: This video and description contains affiliate links, which means that if you click on one of the product links, I’ll receive a small commission.
Рекомендации по теме
Комментарии
Автор

i’ve been running proxmox cluster with ceph pool on 3 dell 7060 in my environment for about 6 months now it’s been working great and it hasn’t had any failures as of yet. i highly recommend doing this if you have the resources

romayojr
Автор

at 10:50 there is a cut, and for folks who may be following along with this video I want to clarify what happened because it's a gotcha that people new to proxmox really should understand (I'm not trying to undermine Don here, he did a fantastic job with this video demo). - the way that the test container is shown configured (specifically using the default storage locations for storage, which in this case is the local storage) is INCORRECT for this configuration. you *must* choose the shared storage (in this case the ceph pool that was created called "proxpool" if you want to configure HA (it doesn't let you do it otherwise because it's not backed by shared storage). do not despair amigos, if you configure your vm or container accidentally on local storage and you already start deploying your workload and set it up and then you decide you want this vm/ct to be part of your HA config, you can just move the storage from your local to your shared storage by:
1. click on the ct you accidentally created on local storage
2. click on "resources"
3. click on "root disk"
4. click on the "volume action" pull-down at the top of the resources window
5. click on "move storage"
6. select the destination (shared) storage you want to move it to
repeat this for all disks that belong in this container, and HA will work once all disks attached to the container are on shared storage. this procedure works the same for VMs as well, but you'll find the storage configuration under the "hardware" tab instead of the "resources tab" - and then you just click on each disk, and do the same "volume action" - "move storage" as with ct's.

*pro tip*: proxmox, at the time of this writing, does NOT support the changing of the "default" storage location when you make new vm's and CT's, HOWEVER, this list is ALWAYS (at the time of this writing) in alphabetical order, and it defaults to the first storage location in the alphabet. if you wish to set the default, you can name whatever storage you like to be the default to the first name alphabetically. (for lots of people i've seen this as something like "ceph pool" BUT, for some strange reason proxmox prioritizes storage target ids that are capitalized, so I call my ceph pool NVME (because it's built on my nvme storage) and it shows up at the top of the list, and is thus default when I create storage. note: unfortunately you can't just change a storage id because your vm's won't know where your storage is. if you need to rename your storage, your best bet is to create a new ceph pool with the new name (based on the same storage - don't worry, ceph pools are thin provisioned), go to each vm/ct's storage, and move the storage to the new pool. when there is nothing left on the pool (you can verify this by looking at the pool in the storage section and making sure there aren't any files stored there), you can remove it from the cluster's storage section, then remove the pool from the ceph options.

Cpgeekorg
Автор

Excellent presentation. Didn't over or under explain anything IMO. I appreciate it!

GreatWes
Автор

I added a usb NMVE enclosure with 1TB SSD to each node in my N100 3 node Proxmox cluster. The nodes' have USB 3.0 ports. Installed a Ceph pool using those three SSDs. Ceph and NFS (Synology DS1520+) are using the second adapter on each node and the NAS making the storage network traffic isolated from regular traffic. I moved a debian vm to NFS and timed a migrate from node 2 to 1. Repeated that migrate with that same debian on Ceph. Like Don, I was pinging the default router from that debian vm during the migrations. Never lost a ping. The timing for the 48G debian machine migration on NFS was 19 sec with 55 ms downtime. For Ceph, the timing was 18 seconds with 52 ms down time. Migration speed for both was 187.7 MiB/s. The HP 1TB EX900 Plus NVME SSD is gen3 but the SSK SHE-C325 Pro NVME Gen2 enclosure is USB 3.2.

Not much of a performance difference in my config for NFS vs Ceph. At least there's a benefit for not having the NAS as a single point of failure.

techadsr
Автор

I run an HA cluster atm with two identically named ZFS pools and so long as I put the CT disks in that pool then it allows replication and full HA functionality. I don't see any need to add the extra complexity of Ceph just for HA. Ceph seems awesome but it's an order of magnitude higher complexity over ZFS...

mikebakkeyt
Автор

Just thinkjng about it and you’re posting this 😂🎉

dijkstw
Автор

Just to confirm, I think I missed something, each 'prox1, prox2, prox3, prox.n' would be different machines running proxmox on the same network? I shall rewatch and maybe have a bit of a play around with the documentation. Thanks for all the recent proxmox tutes mate, they have been very helpful indeed!

CaptZenPetabyte
Автор

3:32 "as far as migrating through here, you cannot do that yet until you set up a ceph" - this is incorrect. in this state, you CAN migrate vm's from one node to another, they just have to be paused first. all that's required for moving vm's node to node is a basic cluster. HOWEVER, because the storage isn't shared between them, it does take longer to move vm's between nodes in this state because the entirety of the storage needs to move from one node to the next. the way it works if you have ceph (or another shared storage, it doesn't have to be ceph, it could be an external share or something, ceph is just a great way to set up shared service with node-level (or otherwise adaptable) redundancy), is that instead of moving full disk images when you migrate, the destination node accesses the shared storage volume (so the storage doesn't have to move at all). which means the only thing that needs to be transferred between nodes is the active memory image, and this is done in 2 passes to minimize latency in the final handoff (so it transfers all blocks of the active vm ram, then it suspends the vm on the source node, copies any memory blocks that have changed since the initial copy, and then the vm is suspended at the destination node and resumes. on a fast network connection this final handoff process can be done in under a couple miliseconds so to any users using the services of the vm being transferred, are none the wiser. - you can start a ping, migrate a vm mid-request and the vm will respond in time at it's destination (maybe adding 2-3ms to the response time). it's FANTASTIC!

Cpgeekorg
Автор

At 6:20, you say "you'll see the disk that I add." Where did this disk get defined? My config doesn't have an available disk.

LukasBradley
Автор

I got messed up when you created a OSD. I had no other disks available to use on any of the nodes.

remio
Автор

hmmm should you have not choose storage prox-pool? and not a local-lvm?

GT-scsk
Автор

and migration mode: restart..which maybe explain your losse of ping..did the LXC restarted?

GT-scsk
Автор

@4:35 - please dive more on the cluster networking part:
on VMWare vSAN you have storage network, VM network, fail-over network, etc ...
what's the best way as in networking to build a ceph cluster with 3 or more host?

fbifido
Автор

Hey Don, are you RDP-ing to manage your infrastructure? I noticed that there are two mouse cursors. If your using some sort of jump box or bastion host, could you share how you're connecting to your bastion host and what is your bastion host?

karloa
Автор

I have 2 nodes running myself (2 Dell r620) waiting for some lower end CPUs to arrive in the mail before I bring the third node online. It came with some BEFFIER CPUs and that means jet engine screams (1u server fans).

JohnWeland
Автор

@7:20 - it would be nice if you would explain what each field is for or what's best practices.

fbifido
Автор

Hello dear,
Would it work with Pimox too??
Thanks for the great vídeos!

diegosantos
Автор

So does this work with LXC container too? They don't start after migration, but HA isn't considered migration.

dzmelinux
Автор

I’m going to talk only about my experience with proxmox and I have tried to used several times. I do not like it I think that there are better options. My setup is a 3 node i5 13th gen with ceph and 32gb ram two nics one for ceph traffic and the other for all other traffic. I think it’s very slow, there is a problem, in my opinion, when stopping the vm’s when they get stuck or you made a mistake in some way with the vm. The templates can only be cloned from one node and they are attached to that node different from VMware of course you can migrate the templates. Installing a Linux vm in the traditional way takes a long time like several hours something like 4 hours or more. The ceph speed on ssd was around 13mb/s. I made a test by moving all my 10vm from 3 to only 2 nodes to test on the third node the speeds. Maybe it’s me and I’m not used to this kind of solution because I was a VCP on 5.5 and 6 I normally prefer fedora KVM because of cockpit but that doesn’t provide any way to cluster 2/3 machines. In sum I got tired of it and installed harvester hci and now a vm is installed in 5m or a bit more, longhorn gives speeds around 80mb/s.
This is just my last experience and the previous ones. I hope this helps someone. Thank you.

dafx
Автор

this is one you needed to do but pls explore other netfs plus what is you wanted to combine all pc and gpu to look like one pc - can you explain or diy that in followup? #HA 40g #no switch #load balancing

shephusted