Capacity Planning
- How many nodes?
- The basic starting point is two nodes with 2 cores and 4GB of memory on each node
- For fault tolerance perspective three nodes is more appropriate for any cluster
- What's better more nodes or bigger nodes?
- More nodes equals IO, Memory, and GC (garbage collector) distributed processing
- Common pitfall with distributed databases - stressing common storage e.g. SAN (system attached storage)
- Bigger nodes means more processing can be performed on a node with fast access to in-memory data and faster local IO
- Resizing node in production is likely more challenging than adding a new node to the cluster
- Elasticsearch is built for scaling out on commodity hardware, not up on single massive machine
- How high can it go? Pretty high
- So which one it is going to be: more smaller nodes or less larger nodes?