How To Train A Neural Network In Python – Part I

January 12, 2016January 13, 2016 ~ Prateek Joshi ~ 10 Comments

1 main Deep learning uses neural networks to build sophisticated models. The basic building blocks of these neural networks are called “neurons”. When a neuron is trained to act like a simple classifier, we call it “perceptron”. A neural network consists of a lot of perceptrons interconnected with each other. Let’s say we have a bunch of inputs and the corresponding desired outputs. The goal of deep learning is to train this neural network so that the system outputs the right value for the given set of inputs. This process basically involves tuning each neuron in the network until it behaves a certain way. So what exactly is this perceptron? How do we train it in Python? Continue reading “How To Train A Neural Network In Python – Part I” →

How To Install Caffe On Ubuntu

January 5, 2016December 4, 2015 ~ Prateek Joshi ~ 1 Comment

main The concept of deep learning is becoming increasingly pervasive. It is a new area of research in machine learning that focuses on learning optimal representations of data. Now what does it mean? In the realm of classical machine learning, we have the build the features first and then the machine learning algorithm will learn how to classify the data based on these features. The problem is that feature-building is a trial-and-error process and we want to avoid manual intervention. This is where deep learning tends to shine! Instead of manually building the features ourselves, we can just let our deep neural networks learn the features and then build a system to classify that data. In this field, people work towards building a set of algorithms that can model abstractions in our data using multilayered neural networks. Caffe is one of the most popular libraries available out there for deep learning. Let’s go ahead and see how to get it up and running on Ubuntu, shall we? Continue reading “How To Install Caffe On Ubuntu” →

Performing Windowed Computations On Streaming Data Using Spark In Python

December 29, 2015November 14, 2015 ~ Prateek Joshi ~ Leave a comment

We deal with real time data all the time. If you look at those analytics dashboards, you can see how they perform computations and tell us what happened in the last 60 mins or may be the last 7 hours. They are dealing with terabytes of data and yet they can process all of that in real time. These insights are extremely valuable because you can take the right actions if you know what’s happening. If you have a shopping website, you need to know what happened in the last few hours so that you can boost your sales. Are there a lot of visitors from France? Can I organize a quick French themed promotion to increase my sales during peak hours? The answers to all these lies deep within your data. Spark Streaming is amazing at these things! So how do we do windowed computations in Spark? How can we process this data in real time? Continue reading “Performing Windowed Computations On Streaming Data Using Spark In Python” →

Analyzing Real-time Data With Spark Streaming In Python

December 22, 2015November 14, 2015 ~ Prateek Joshi ~ 1 Comment

There is a lot of data being generated in today’s digital world, so there is a high demand for real time data analytics. This data usually comes in bits and pieces from many different sources. It can come in various forms like words, images, numbers, and so on. Twitter is a good example of words being generated in real time. We also have websites where statistics like number of visitors, page views, and so on are being generated in real time. There are so much data that it is not very useful in its raw form. We need to process it and extract insights from it so that it becomes useful. This is where Spark Streaming comes into the picture! It is exceptionally good at processing real time data and it is highly scalable. It can process enormous amounts of data in real time without skipping a beat. So how exactly does Spark do it? How do we use it? Continue reading “Analyzing Real-time Data With Spark Streaming In Python” →

Getting Started With Apache Spark In Python

November 17, 2015November 18, 2015 ~ Prateek Joshi ~ Leave a comment

In one of the previous blog posts, we discussed how to get Apache Spark up and running on your Ubuntu box. In this post, we will start exploring it. One of the best things about Spark is that it comes with a Python API that works like a charm! The API also available in Java, Scala, and R. That pretty much covers the entire world of programming and data science! Spark’s shell provides a great way to analyze our data and work with it interactively. We are going to see how to interact with Spark Python API in this post. You would have downloaded Spark on your machine. Let’s go into “spark-1.5.1” directory on your terminal and get started, shall we? Continue reading “Getting Started With Apache Spark In Python” →

Understanding Filter, Map, And Reduce In Python

November 10, 2015November 10, 2015 ~ Prateek Joshi ~ 3 Comments

Even though lot of people use Python in an object oriented style, it has several functions that enable functional programming. For those of you who don’t know, functional programming is a programming paradigm based on lambda calculus that treats computation as an evaluation of mathematical functions. Some of prominent functional programming languages include Scala, Haskell, Clojure, and so on. You should go through this nice article on functional programming that explains it in layman’s terms. Coming back to the topic at hand, Python provides features like lambda, filter, map, and reduce that can basically cover most of what you would need to operate on data. Let’s go ahead and play with them to understand their awesomeness, shall we? Continue reading “Understanding Filter, Map, And Reduce In Python” →

Enabling Tab Autocomplete In Python Shell

October 27, 2015October 11, 2015 ~ Prateek Joshi ~ 3 Comments

It’s fun to play around with Python. One of its best features is the interactive shell where we can experiment all we want. Let’s say you open up a shell, declare a bunch of variables and want to operate on them. You don’t want to type the full variables names over and over again, right? Also, it’s difficult to remember the full names of all the inbuilt methods and functions as well. Since we are playing around with the same variables and inbuilt functions, it would be nice to have an autocomplete feature that can complete the variable and function names for us. Fortunately, Python provides that nifty little feature! Let’s see how we can enable it here. Continue reading “Enabling Tab Autocomplete In Python Shell” →

Installing OpenCV 3 With Python On Mac OS X

October 9, 2015January 19, 2016 ~ Prateek Joshi ~ 16 Comments

OpenCV is the world’s most popular computer vision library and it’s used extensively by researchers and developers around the world. OpenCV has been around for a while now and they add something new and interesting with every new release. One of the main additions of OpenCV 3 is “opencv_contrib” which contains a lot of cutting edge algorithms for feature descriptors, text detection, object tracking, shape matching, and so on. They have greatly improved Python support in this release as well. Since OpenCV is available on almost all the popular platforms, this version looks very promising. Let’s see how to install OpenCV 3 with Python support on Mac OS X. Continue reading “Installing OpenCV 3 With Python On Mac OS X” →

The Cost Of Abstraction In Python

November 23, 2014 ~ Prateek Joshi ~ Leave a comment

People who have been coding for years will tell you that abstraction is always a good thing. It makes the system more robust and tolerant to future changes. Recently, I’ve been working on some Python libraries that wrap C extensions. While working on those libraries, a couple of interesting things came up.This blog post is about one of those things, and how it affects the speed. When we talk about abstraction, we tend to think about hierarchies, encapsulations, and robustness in general. But what about the ways in which they are implemented underneath? Different languages implement this in their own way, and they are not always optimal. This is what we are going to talk about. Let’s dive in, shall we? Continue reading “The Cost Of Abstraction In Python” →

Why Is Python Slow?

November 8, 2014 ~ Prateek Joshi ~ 3 Comments

When people talk about speed in the world of programming languages, they usually center the discussion around compiled vs interpreted languages. In this post, we will discuss two of the most famous languages on this planet, Python and C. I was recently playing around with this and I made a few pleasant discoveries. So I thought I should share them here. One of the biggest reasons as to why Python is slower than C is because of the dynamic typing feature in Python. While it may be true that dynamically typed programming languages are slower than statically typed languages, it may not be the major factor slowing down your Python code. The dynamic typing feature of programming languages like Python makes the interpreters harder to optimize. I guess this is the cost of having an extremely beginner-friendly language! But one thing to note is that there is a big difference between interpreter being harder to optimize and your code being slow. We’ve had years of research focusing on the best way to perform type checking at runtime in these languages, thus making this overhead negligible. So how do we understand why Python code is slower than C code? How do we write Python code that’s not slow? Continue reading “Why Is Python Slow?” →