In the previous blog post, we saw how to start a Spark cluster on EC2 using the inbuilt launch scripts. This is good if you want get something up and running quickly, but it won’t allow fine-grained control over our cluster. A lot of times, we would want to customize the machines that we spin up. Let’s say that you want to use different types of machines to handle production level traffic in different regions. May be you are not on EC2 and you want to launch some machines in your cluster. How would you do it? This is the reason we have Spark Standalone mode. Using this method, we can manually launch any number of machines independently in our private cluster and make them listen to our master machine. It gives us a lot of flexibility! Let’s go ahead and see how to do it, shall we? Continue reading “Launching A Spark Standalone Cluster”
Tag: AWS
How To Launch A Spark Cluster On Amazon EC2
Apache Spark is marketed as “lightning fast cluster computing” and it stands true to its word! It can do amazing things really quickly using a cluster of machines. So how do we assemble that cluster? Let’s say you are using a cloud service provider like Amazon Web Services. Do we need to spin up a bunch of instances ourselves to form a “cluster”? Well, not really! Spark can launch a cluster by itself and you can control everything from one machine. You just need to log into your main instance and Spark will automatically launch all the instances in the cluster for you. It’s beautiful! Let’s go ahead and see how to launch a cluster, shall we? Continue reading “How To Launch A Spark Cluster On Amazon EC2”