How To Install Apache Spark On Ubuntu

November 3, 2015July 11, 2016 ~ Prateek Joshi ~ 3 Comments

There’s so much data being generated in today’s world that we need platforms and frameworks that it’s mind boggling. This field of study is called Big Data Analysis. With so much data lying around, often ranging in petabytes and exabytes, we need super powerful systems to process it. Not only that, we need to do it high efficiency. If you try to do it using your regular ways, you will never be able to do anything in time, let alone doing it in real-time. This is where Apache Spark comes into picture. It is an open source big data processing framework that can process massive amounts of data at high speed using cluster computing. Let’s see how we can install it on Ubuntu. Continue reading “How To Install Apache Spark On Ubuntu” →

How To Add Swap Space On Ubuntu

March 21, 2015 ~ Prateek Joshi ~ Leave a comment

Whenever you are building an application that’s memory intensive, you are bound to run into memory issues. Those out of memory errors are painful to deal with, especially when they happen during production. Before putting your code on your server, you need to make sure that it can handle the application’s memory requirements. But even if you are careful, something might still go wrong and you might end up running into memory issues. One of the easiest ways to deal with this is by adding some swap space. Now how will it help our case? How can we use it on Ubuntu? Continue reading “How To Add Swap Space On Ubuntu” →

How To Schedule Tasks On Linux

April 19, 2014April 20, 2014 ~ Prateek Joshi ~ Leave a comment

Let’s say you have a website that does some heavy lifting. This means that you have designed a backend and hosted it on your web server. Now, you might want to run some processes periodically like generating thumbnails or enriching data in the background. The reason for this is that we don’t want to interfere with the user interface when you run these processes. It should happen somewhere in the background and it should happen automatically. Unix-based systems have a great program for this called ‘cron’. It allows tasks to automatically run in the background at regular intervals. You could also use it to automatically create backups, synchronize files, schedule updates, and much more. So how to we set this up? Continue reading “How To Schedule Tasks On Linux” →

How To Install PIL On Ubuntu

April 19, 2014April 19, 2014 ~ Prateek Joshi ~ 2 Comments

Let’s say you want to play around with images in Python. To do that, we need a Python package that can handle all the image manipulation. Python Imaging Library (PIL) is one of most popular libraries that is used to process the image data. Actually, people use Pillow now, which is a modern repackaged version of PIL. It has a lot of nice functionalities and it works well. Let’s see how you can install PIL on 64-bit Ubuntu 12.04. Continue reading “How To Install PIL On Ubuntu” →

Using Multiple CPU Cores With Command Line Tools

March 7, 2014 ~ Prateek Joshi ~ Leave a comment

All of you must have heard about how the processors in our laptops have multiple cores. It’s good that the technology is advancing in that direction. When people write programs, they can utilize these cores to increase the speed of computation. But most of the inbuilt commands don’t use these cores unless specified explicitly. If you ever want to add up a very large list, say hundreds of megabytes, or just look through it to find some particular value, you would write a simple program to do it. But going through so much data takes a lot of time if you just use a single thread. The same is true for tools like grep, bzip2, wc, awk, sed, etc. If the last sentence looked like jibber-jabber, then you should probably google those things before you proceed. They are singly-threaded and will just use one CPU core. So how do we use multiple cores in these situations? Continue reading “Using Multiple CPU Cores With Command Line Tools” →

CMake vs Make

February 1, 2014August 4, 2014 ~ Prateek Joshi ~ 40 Comments

Programmers have been using CMake and Make for a long time now. When you join a big company or start working on a project with a large codebase, there are all these builds that you need to take care of. You must have seen those “CMakeLists.txt” files floating around. You are supposed to run “cmake” and “make” commands on the terminal. A lot of people just follow the instructions blindly, not really caring about why we need to do things in a certain way. What is this whole build process and why is it structured this way? What are the differences between CMake and Make? Does it matter? Are they interchangeable? Continue reading “CMake vs Make” →

Operation Aurora

November 15, 2012November 16, 2012 ~ Prateek Joshi ~ 2 Comments

On January 14, 2010 McAfee Labs identified a zero-day vulnerability in Microsoft Internet Explorer that was used as an entry point for Operation Aurora to exploit Google and at least 20 other companies. Microsoft issued a security bulletin and patch immediately. Operation Aurora was a coordinated attack which included a piece of computer code that exploits the Microsoft Internet Explorer vulnerability to gain access to computer systems. This exploit is then extended to download and activate malware within the systems. The attack, which was initiated stealthily when targeted users accessed a malicious web page, ultimately connected those computer systems to a remote server. Now this connection was used to steal company intellectual property and additionally gain access to user accounts. Why did the users visit the malicious web page? Likely because they believed it to be reputable. This attack became particularly famous because of the level of sophistication and the obfuscation methods used. Continue reading “Operation Aurora” →

What Is SSH?

August 28, 2012November 4, 2013 ~ Prateek Joshi ~ 4 Comments

Consider the following situation. You are at your friend’s place with your laptop and you want to access your home computer to do something. May be you want to start a download or you want to run a program right away. What would you do in this situation? Will you go all the way to your house just to start a download? You already have a laptop at your disposal, so you should be able to use it somehow. You can just connect to your home computer through internet. But what if someone else hacks you while you do that? This is where SSH comes in. Continue reading “What Is SSH?” →

Inside The Shell of Shell Scripting

June 7, 2012September 5, 2012 ~ Prateek Joshi ~ 2 Comments

I’m pretty sure people working in the tech domain are aware of something called Shell Scripting. Were you ever working on a project and felt that some tasks were repeating over and over again? Did you feel that there should be an easy way to automate these things and you don’t have to worry about it every single time you want to run your project? People who have heard about shell scripting and don’t know what it is, you are missing out on a cool weapon in your arsenal. People who haven’t heard about shell scripting, well it’s time to move you into the other category. So what exactly is shell scripting? Continue reading “Inside The Shell of Shell Scripting” →