What Is Monte Carlo Simulation

1 mainThere are many phenomena in everyday life where it’s very difficult to model the problem. There are so many variables and so many dependencies that any approximation or assumption would lead to a huge errors in outputs. This is usually a combination of uncertainty and variability. Even though we have access to all the historical information, we can’t accurately predict a future outcome because of inaccurate modeling. This becomes especially relevant when we are dealing with systems where the degrees of freedom are dependent on each other. An example would be movement of fluids or kinetic modeling of gases. How do we compute the possible outcomes? How can we assess the impact of all the free variables to make sure we predict the outcome under uncertainty?   Continue reading

Undestanding IoT Gateways

1 mainThe Internet of Things (IoT) ecosystem is rapidly expanding. Some analysts predict that there will be around 50 billion connected devices by 2020. If you are new to IoT, it refers to the collective ecosystem of devices that are connected to the internet. These devices can be sensors, actuators, health monitors, meters, and so on. What did people do before IoT? Well, they had devices that weren’t connected to the internet. Hence it was difficult to monitor and analyze data in real time. This meant that people were leaving a lot of interesting data unused, which directly translates to lost revenue of billions of dollars. By connecting all the devices to the internet, we are enabling ourselves to take actions in real time. It’s obvious that device connectivity is a really important aspect in IoT. How do we ensure connectivity? How can we enable low cost hardware devices to communicate with the cloud without expensive processors?   Continue reading

Understanding The Industrial IoT Technology Stack

1 mainInternet of Things (IoT) has emerged as one of the hottest trends in the technology world. It has the potential to radically change the way we experience life. It will particularly have a huge impact on the industrial world where we have to deal with massive machines, buildings, and open fields. Industrial technologies have direct impact on some of the most pressing problems facing humanity like water shortage, energy consumption, infrastructure management, and so on. When we apply IoT methodologies to the industrial world, it is called Industrial IoT. There has been a lot of discussion as to what exactly is it. Is it a technology? Is it a collection of things? More importantly, there has been a lot of misinformation around it. Let’s go ahead and dissect it, shall we?   Continue reading

Performing Windowed Computations On Streaming Data Using Spark In Python

1 mainWe deal with real time data all the time. If you look at those analytics dashboards, you can see how they perform computations and tell us what happened in the last 60 mins or may be the last 7 hours. They are dealing with terabytes of data and yet they can process all of that in real time. These insights are extremely valuable because you can take the right actions if you know what’s happening. If you have a shopping website, you need to know what happened in the last few hours so that you can boost your sales. Are there a lot of visitors from France? Can I organize a quick French themed promotion to increase my sales during peak hours? The answers to all these lies deep within your data. Spark Streaming is amazing at these things! So how do we do windowed computations in Spark? How can we process this data in real time?   Continue reading

Analyzing Real-time Data With Spark Streaming In Python

1 mainThere is a lot of data being generated in today’s digital world, so there is a high demand for real time data analytics. This data usually comes in bits and pieces from many different sources. It can come in various forms like words, images, numbers, and so on. Twitter is a good example of words being generated in real time. We also have websites where statistics like number of visitors, page views, and so on are being generated in real time. There are so much data that it is not very useful in its raw form. We need to process it and extract insights from it so that it becomes useful. This is where Spark Streaming comes into the picture! It is exceptionally good at processing real time data and it is highly scalable. It can process enormous amounts of data in real time without skipping a beat. So how exactly does Spark do it? How do we use it?   Continue reading

Launching A Spark Standalone Cluster

1 mainIn the previous blog post, we saw how to start a Spark cluster on EC2 using the inbuilt launch scripts. This is good if you want get something up and running quickly, but it won’t allow fine-grained control over our cluster. A lot of times, we would want to customize the machines that we spin up. Let’s say that you want to use different types of machines to handle production level traffic in different regions. May be you are not on EC2 and you want to launch some machines in your cluster. How would you do it? This is the reason we have Spark Standalone mode. Using this method, we can manually launch any number of machines independently in our private cluster and make them listen to our master machine. It gives us a lot of flexibility! Let’s go ahead and see how to do it, shall we?   Continue reading

How To Launch A Spark Cluster On Amazon EC2

1 mainApache Spark is marketed as “lightning fast cluster computing” and it stands true to its word! It can do amazing things really quickly using a cluster of machines. So how do we assemble that cluster? Let’s say you are using a cloud service provider like Amazon Web Services. Do we need to spin up a bunch of instances ourselves to form a “cluster”? Well, not really! Spark can launch a cluster by itself and you can control everything from one machine. You just need to log into your main instance and Spark will automatically launch all the instances in the cluster for you. It’s beautiful! Let’s go ahead and see how to launch a cluster, shall we?   Continue reading