How To Build A Web Crawler?

webI was reading an article the other day and I came across the term “web crawler”. The context in which it was used got me a little curious about the design of a web crawler. A web crawler is a simple program that scans or “crawls” through web pages to create an index of the data it’s looking for. There are several uses for the program, perhaps the most popular being search engines using it to provide web surfers with relevant websites. Google has perfected the art of crawling over the years! A web crawler can pretty much be used by anyone who is trying to search for information on the Internet in an organized manner. It is referred to by different names like web spider, bot, indexer etc. Anyway, that article got me thinking about building a web crawler. I just wanted to fiddle with it and see how much time it will take to get something working on my machine. It turned out to be quite easy!   Continue reading

The Power Of A/B

Designing a website is more of an art than a science. There are a million different ways to design a website and achieve a particular goal. We want our websites to eventually become popular and make money. Once the site is designed, it cannot be stagnant for long either. But how do we know if the users will like the new design? User base is critical and losing them is very risky. Once the users lose trust, it’s very difficult to earn it back. We want to take the guesswork out of website optimization and enable making decisions based on real data. By measuring the impact of the changes, you can ensure that every change produces positive results. So how do we do it?   Continue reading

Automatic Downloading

Let’s say you are surfing the web and you come across a cool website with a great collection of pictures. Some of them are located on that page and many more can be reached through various links on that page. There are hundreds of pictures and you want to download all of them. How would you do it? Would you click and save each image separately? Let’s say you really like the design of that site and you want to download the whole thing along with the source code. How would you do it automatically without wasting time?   Continue reading