Am; YourMove: How a 280 year old algorithm is used in a year old application? | tech talk

he the post will be updated soon…

This post is a complete description about the video post on 15th May, 17 in AM; YourMove .

If you are a computer science student or if you have ever have looked various algorithms. Devising an algorithm can be really hard, but a well-optimized algorithm can always be appreciated. Classic algorithms never go out of fashion in fact In computer science most oof the systems are still working fine under those algorithms it self. Algorithms are answers or solutions to real world problem stated in the most interesting problem statement. I personally get excited when I see a great algorithm and always want to share the same knowledge and excitement.

But, Nowadays due boom of application development algorithm engineering is not really focused topic by engineering students. But we never know when a oldie-but-goodie algorithm might come in handy. This post focuses on a Google’s application Google Trips, Which used a 280-year-old algorithm!

If you do not know what is Google Trips where we can plan our vacation. One of the features, we can pre-plan which places we are interested in visiting and google creates a optimal route with minimum travel time. Google also takes care of your interests, timings and other important parameters. This is termed as ” itineraries”. 

The problem statement of the application is very similar to that of travelling salesmen problem. It states that,

Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?

But Travelling salesmen problem belongs to the class of NP-hard. NP-hard problems are those problems which cannot be solved in deterministic time. That means they cannot be solved in polynomial time. so, a TSP cannot be implemented and if so we cannot expect a great user experience! Which is, in fact, a real deal for a company like Google.

But TSP problem can be solved by making some assumptions and approximations. The rest of the post talks about how the problem of creating itineraries is solved by using the discoveries of some great scientists.

In 1736, Leonhard Euler,  studied the following question: is it possible to walk through the city crossing each bridge exactly once?

—-add pic

As it turns out, for the city of Königsberg, the answer is no

Euler noticed that if all the nodes in the graph have an even number of edges (such graphs are called “Eulerian” in his honour) then, and only then, a cycle can be found that visits every edge exactly once. Keep this in mind, as we’ll rely on this fact later!

One of the approximation algorithms for TSP is an algorithm by Christofides. The assumption is the distances form a metric space (they are symmetric and obey the triangle inequality). It is an approximation algorithm that guarantees that its solutions will be within a factor of 3/2 of the optimal solution length.

This is an important part of the solution and builds on Euler’s paper.  here is a quick four-step rundown of how it works here:

  1. We start with all our destinations separate and repeatedly connect together the closest two that aren’t yet connected. This doesn’t yet give us an itinerary, but it does connect all the destinations via a minimum spanning tree of the graph.
  2. We take all the destinations that have an odd number of connections in this tree (Euler proved there must be an even number of these), and carefully pair them up.
  3. Because all the destinations now have an even number of edges, we’ve created a Eulerian graph, so we create a route that crosses each edge exactly once.
  4. We now have a great route, but it might visit some places more than once. No problem, we find any double visits and simply bypass them, going directly from the predecessor to the successor.

This is how a couple of algorithms and their discoveries are cleverly used to get a solution for their problem statement. I hope this post inspires you and motivates you to start looking into algorithms in a different perspective to solve real world problems.

And I’ll see you in the next one.

Sources: https://research.googleblog.com/2016/09/the-280-year-old-algorithm-inside.html

 

 

 
.

Advertisements

Am; YourMove : What is Load Balancer and Consistent Hashing #2

This is cont. Am; YourMove : What is Load Balancer and Consistent Hashing #1

In last post, we were introduced to concepts like hashing. In this post, we will learn in detail why “just” hashing will not work and why there is a need for an algorithm like consistent hashing. In specific, we will try to know more about the algorithm that is developed by google.

We now know that we have load balance the requests across multiple data centers. Consider an example for why classic hashing technique is not sufficient. If you have a collection of n cache machines then a common way of load balancing across them is to put object o in cache machine number hash(o) mod n. This works well until you add or remove cache machines (for whatever reason), for then n changes and every object is hashed to a new location.

This is why consistent hashing comes into the picture. It is interesting to note that it is only the client that needs to implement the consistent hashing algorithm – the memcached server is unchanged. Other systems that employ consistent hashing include Chord, which is a distributed hash table implementation, and Amazon’s Dynamo, which is a key-value store (not available outside Amazon).

The algorithm which we will be discussing in the post is called as  “consistent hashing with bounded loads”. The main aim for the algorithm is to achieve both uniformity and consistency in the resulting allocations.

We can think about the servers as bins and clients as balls.

 

The uniformity objective encourages all bins to have a load roughly equal to the average density (the number of balls divided by the number of bins). For some parameter ε, we set the capacity of each bin to either floor or ceiling of the average load times (1+ε). This extra capacity allows us to design an allocation algorithm that meets the consistency objective in addition to the uniformity property.

Imagine a given range of numbers overlaid on a circle. We apply a hash function to balls and a separate hash function to bins to obtain numbers in that range that correspond to positions on that circle. We then start allocating balls in a specific order independent of their hash values (let’s say based on their ID). Then each ball is moved clockwise and is assigned to the first bin with spare capacity.

Consider the example above where 6 balls and 3 bins are assigned using two separate hash functions to random locations on the circle. For the sake of this instance, assume the capacity of each bin is set to 2. We start allocating balls in the increasing order of their ID values. Ball number 1 moves clockwise and goes to bin C. Ball number 2 goes to A. Balls 3 and 4 go to bin B. Ball number 5 goes to bin C. Then ball number 6 moves clockwise and hits bin B first. However, bin B has capacity 2 and already contains balls 3 and 4. So ball 6 keeps moving to reach bin C but that bin is also full. Finally, ball 6 ends up in bin A that has a spare slot for it.

 

Capture

 

Upon any update in the system (ball or bin insertion/deletion), the allocation is recomputed to keep the uniformity objective. The art of the analysis is to show that a small update (a few number of insertions and deletions) results in minor changes in the state of the allocation and therefore the consistency objective is met. In the paper, its also show that every ball removal or insertion in the system results in O(1/ε2) movements of other balls.

This algorithm is just not theoretical! This is in fact implemented in one of the famous companies. Andrew Rodland from Vimeo had found the paper and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck. He recently summarized this story in a blog post detailing his use case.

Check out simple implementation of simple consistent hashing algorithm check –link

The algorithm is open-source, allowing anyone to use this algorithm. To get more insights like improvement statistics, performance analysis etc is found in the paper published. paper

This was one of the interesting topics yet on AM; YourMove. Check out the channel here SUBSCRIBE NOW!

And, I’ll see you in the next one !

Beginning ML: Sentiment Analysis Using Textblob : KnowDev

In the last post we have build a neural network for sentiment analysis. We have used our own dataset which was not pretty big enough. Indeed we were able to achieve accuracy of 54%. Today we shall be using a module of python for sentiment analysis. We shall be building twitter sentiment analyzer ! believe me you’ll be amazed by how easily we can achieve it !

First we need to install 2 modules, tweepy, which allows us to make API calls to twitter. We have to create a app in twitter developer to actually authenticate ourselves. Next,  we need textblob which can perform sentiment analysis. Textblob can actually perform many more operations apart from sentiment analysis. If you are you can check out here.

Let’s import our dependencies

import tweepy
from textblob import TextBlob

We have to declare 4 variables, consumer_key, consumer_secret, access_token, access_token_secret all these can be found after we create app in twitter developer site.

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

We can authenticate ourselves by above 2 lines. We are almost done with the authentication.

api = tweepy.API(auth)

Through api variable we can use search operation to find public tweets.

public_tweet = api.search('search')

search is the key word we will finding for. Now we can iterate through public_tweets and use textblob to perform sentiment analysis on the tweet.

for tweet in public_tweet:
    T = tweet.text
    analysis = TextBlob(tweet.text)
    sentiment = analysis.sentiment.polarity
    print T, sentiment

And that’s it ! We have successfully using tweepy and textblob modules to build a twitter sentiment analyzer in less than 25 lines. In fact there are many more sources from which we can use API.

This is a relatively small post and you know why ! Now you can use sentiment analyzer for wide range of use cases and I’ll see you in next !

Complete source code

Beginning ML – Sentiment Analysis Using Deep Neural Network: KnowDev

In last post we have implemented our first neural network which can classify a set a images. In fact, That experiment can be considered as HELLO WORLD program. There is lot more we have consider while implementing our model. Mainly data ! lot of times data is very raw and we are required to perform some kind of preprocessing so that the data is in format that tensorflow objects can accept. Sentiment Analyser is a program which can tell whether the given sentence is positive or negative. We will be using the same neural network model that we have build in last post. For better understanding purpose the whole process of building of sentiment analyzer is divided into parts.

In this post, we shall be looking on how to get raw data and convert into required format. Both positive dataset and negative dataset is available in GIT link. First we’ll download the datasets into our directory. Both datasets contains 5000 sentences each. Yes! the data we got is not really enough for practical purposes.

Once we have got our datasets ready in our directory. Import tensorflow(duh!) we shall be creating feature sets from the data.

First of all we have create our vocabulary of words.The model we will is bag of words. We will call this collection words as lexicon. We will be using nltk library to extract words which are most relevant. The technique we are using is stemming and lemmatizing.

lexicon = [lemmatizer.lemmatize(i) for i in lexicon]

lexicon in the LHS just contains all the words from pos.txt and neg.txt (our datasets). In fact we can  employ other techniques such as removing stop words (like the, an, a..) which have no particular effect on the sentiment of the sentence. We are kind of removing  those words by considering only words of frequency more than 1000.

for w in w_counts:
    if 1000 > w_counts[w] > 50:
        l2.append(w) # l2- final lexicon list

Now as if have created our vocabulary we can create out features. Here our lexicon size is 423. A tensor accepts a object of floats but the sentences we have in string. Hence we have to use our lexicon that we have created earlier to make a vector which contains the frequency of words in the sentence.

for example, lexicon = [‘dog’, ‘cat’, ‘eat’, fight’, ‘food’] and the given sentence is ” dog fights with cat for food “. Therefore the feature set is [1, 1, 0, 1, 1].

We create a list of list of features and classification. Positive is denoted as [1, 0] and negative as [0, 1]. 

features = list(features)
featureset.append([features, classification])

Finally we’ll create our collection of featureset of both positive and negative. The list shuffled so that the neural network can converge.

features += sample_handling('pos.txt', lexicon, [1, 0])
features += sample_handling('neg.txt', lexicon, [0, 1])
random.shuffle(features)

Now the whole set is divided training data and testing data.

train_x = list(features[:, 0][:-testing_size])
train_y = list(features[:, 1][:-testing_size])

test_x = list(features[:, 0][-testing_size:])
test_y = list(features[:, 1][-testing_size:])

train_x and test_x are the features and train_y and test_y are the labels. We will be using pickle module for permanent storage of these values so that they can be used later for training our neural network.

In this post we have downloaded our own data and cleaned to our requirements as well as dividing our cleaned data into training data and testing data.

In next post we will be using this data to train our model and test to find accuracy and also run the model against our own inputs ! Awesome right ? I’m excited too…

See you in next !

link to complete source code :  https://github.com/makaravind/SentimentAnalyzer-54

next post : next

 

Decision.

Why taking decisions are so hard ? Is it because you are not sure of what gonna happen or You just really not passionate about it ?

There are few decisions in life that change our lives forever, Sometimes in a good ways or sometimes in bad ways. Before thinking about the consequences. Lets first get this clear, Who is making the decisions , Do we have complete control ? or is it just life forcing you to do things ? When I think about these questions I always wonder what if I’ve had taken another path ? But the person I’m right now is consequence of decisions I’ve taken. Whether to ride my bicycle down the road and having the scar for life time or few words which I couldn’t tell my dearest ones etc. But right now I’m happy ! There are many paths and so many possibilities.

Some times I even think whether I’m overthinking of the decisions I made in the past. But the whole point about this argument in my head is the decisions I have to take today to shape my future. There are so many people might be affected by your own decision. In that case should you really consider about your self or ask others for suggestion. If you are asking for suggestion then the decision you are making is truly not your own. My simple question is can you really have complete control over the situation ? or is is just a illusion ?

There are so many answers and certainly so many unanswered questions. But now I know how to take a decision even in a chaotic situation. There are 2 secrets 1) Time 2) Gut feeling

Time gives you possibilities and gut feeling gives you courage to choose a possibility and go ahead in life. With time you are introduced to many truths that you are aren’t aware of. I understood what gut feeling truly is ! I surely cannot explain how it feels but when it just feels right no matter what. It is gut feeling. Think positively and keep mind fresh and every decision you (or you think you) make is right. It is right because after all it is meant to be this way. So many paths and yet to same destination.

Take time and listen to your gut. Few decisions are made that last forever, Right or Wrong a decision is made. In future you may be glad or sorry that you made a decision , nevertheless you made a decision. Face it !

 

Challenge 1: Day1

Today is the first day of my challenge where I will try to create a pin interest clone webapp using django and backbone.

Today I have dealt with the server side and written the model for pin and created multiple views for various queries.

My model for Pin is something like :

{ interest(nothing but a small video/image, tagline, updated_on, created_on, likes }

Various views :

  1. to display all pins
  2. to update a particular pin
  3. to delete a pin
  4. to display top trending pins

So, tomorrow I’ll start client-side/front-end and as per the requirement I will update the server side also at the same time.

Follow my progress here : https://github.com/makaravind/Pininterest-Clone

 

Challenge : PinInterest clone in 15 days using Backbone.js and Django

I’ve taken up a challenge to create a pin interest clone app using backbone.js and django. I have mentioned in my previous posts that I will learn backbone.js and will try to implement some way or the other. So, to make things more interesting and challenging I’ve committed to myself to take up this challenge. This is my first of this kind. This challenge was inspired by a youtuber who created 12 apps in 12 weeks. That’s in fact a great feat. I wanted to take things bit slow by starting with 2 week challenge and last day for deployment.

Work strategy:

  1. Set up server side using Django framework.
  2. Build front end using Backbone.js and make good UI.
  3. Integrate front-end and back-end simultaneously.

Expected challenges:

  1. Creating a good UI.
  2. User authentication.
  3. Separate Blob database needed or not ? not sure

Challenge will start in 2 days !  All my progress will be shared as posts.. Tune in for more action

[update] Due to some reasons challenge is postponed to a week or so. Details will be updated soon.

[update] Challenge starts today

[update] Challenge discontinued because got stuck with new more important ones. Project is completed till all the basics functionalities .