How To Install Apache Spark On Ubuntu

1 mainThere’s so much data being generated in today’s world that we need platforms and frameworks that it’s mind boggling. This field of study is called Big Data Analysis. With so much data lying around, often ranging in petabytes and exabytes, we need super powerful systems to process it. Not only that, we need to do it high efficiency. If you try to do it using your regular ways, you will never be able to do anything in time, let alone doing it in real-time. This is where Apache Spark comes into picture. It is an open source big data processing framework that can process massive amounts of data at high speed using cluster computing. Let’s see how we can install it on Ubuntu.   Continue reading

How To Add Swap Space On Ubuntu

1 mainWhenever you are building an application that’s memory intensive, you are bound to run into memory issues. Those out of memory errors are painful to deal with, especially when they happen during production. Before putting your code on your server, you need to make sure that it can handle the application’s memory requirements. But even if you are careful, something might still go wrong and you might end up running into memory issues. One of the easiest ways to deal with this is by adding some swap space. Now how will it help our case? How can we use it on Ubuntu?   Continue reading

How To Schedule Tasks On Linux

picLet’s say you have a website that does some heavy lifting. This means that you have designed a backend and hosted it on your web server. Now, you might want to run some processes periodically like generating thumbnails or enriching data in the background. The reason for this is that we don’t want to interfere with the user interface when you run these processes. It should happen somewhere in the background and it should happen automatically. Unix-based systems have a great program for this called ‘cron’. It allows tasks to automatically run in the background at regular intervals. You could also use it to automatically create backups, synchronize files, schedule updates, and much more. So how to we set this up?   Continue reading

How To Install PIL On Ubuntu

picLet’s say you want to play around with images in Python. To do that, we need a Python package that can handle all the image manipulation. Python Imaging Library (PIL) is one of most popular libraries that is used to process the image data. Actually, people use Pillow now, which is a modern repackaged version of PIL. It has a lot of nice functionalities and it works well. Let’s see how you can install PIL on 64-bit Ubuntu 12.04.   Continue reading

Using Multiple CPU Cores With Command Line Tools

command lineAll of you must have heard about how the processors in our laptops have multiple cores. It’s good that the technology is advancing in that direction. When people write programs, they can utilize these cores to increase the speed of computation. But most of the inbuilt commands don’t use these cores unless specified explicitly. If you ever want to add up a very large list, say hundreds of megabytes, or just look through it to find some particular value, you would write a simple program to do it. But going through so much data takes a lot of time if you just use a single thread. The same is true for tools like grep, bzip2, wc, awk, sed, etc. If the last sentence looked like jibber-jabber, then you should probably google those things before you proceed. They are singly-threaded and will just use one CPU core. So how do we use multiple cores in these situations?   Continue reading

CMake vs Make

mainProgrammers have been using CMake and Make for a long time now. When you join a big company or start working on a project with a large codebase, there are all these builds that you need to take care of. You must have seen those “CMakeLists.txt” files floating around. You are supposed to run “cmake” and “make” commands on the terminal. A lot of people just follow the instructions blindly, not really caring about why we need to do things in a certain way. What is this whole build process and why is it structured this way? What are the differences between CMake and Make? Does it matter? Are they interchangeable?   Continue reading

Operation Aurora

On January 14, 2010 McAfee Labs identified a zero-day vulnerability in Microsoft Internet Explorer that was used as an entry point for Operation Aurora to exploit Google and at least 20 other companies. Microsoft  issued a security bulletin and patch immediately. Operation Aurora was a coordinated attack which included a piece of computer code that exploits the Microsoft Internet Explorer vulnerability to gain access to computer systems. This exploit is then extended to download and activate malware within the systems. The attack, which was initiated stealthily when targeted users accessed a malicious web page, ultimately connected those computer systems to a remote server. Now this connection was used to steal company intellectual property and additionally gain access to user accounts. Why did the users visit the malicious web page? Likely because they believed it to be reputable. This attack became particularly famous because of the level of sophistication and the obfuscation methods used.   Continue reading