Am; YourMove: How a 280 year old algorithm is used in a year old application? | tech talk

he the post will be updated soon…

This post is a complete description about the video post on 15th May, 17 in AM; YourMove .

If you are a computer science student or if you have ever have looked various algorithms. Devising an algorithm can be really hard, but a well-optimized algorithm can always be appreciated. Classic algorithms never go out of fashion in fact In computer science most oof the systems are still working fine under those algorithms it self. Algorithms are answers or solutions to real world problem stated in the most interesting problem statement. I personally get excited when I see a great algorithm and always want to share the same knowledge and excitement.

But, Nowadays due boom of application development algorithm engineering is not really focused topic by engineering students. But we never know when a oldie-but-goodie algorithm might come in handy. This post focuses on a Google’s application Google Trips, Which used a 280-year-old algorithm!

If you do not know what is Google Trips where we can plan our vacation. One of the features, we can pre-plan which places we are interested in visiting and google creates a optimal route with minimum travel time. Google also takes care of your interests, timings and other important parameters. This is termed as ” itineraries”. 

The problem statement of the application is very similar to that of travelling salesmen problem. It states that,

Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?

But Travelling salesmen problem belongs to the class of NP-hard. NP-hard problems are those problems which cannot be solved in deterministic time. That means they cannot be solved in polynomial time. so, a TSP cannot be implemented and if so we cannot expect a great user experience! Which is, in fact, a real deal for a company like Google.

But TSP problem can be solved by making some assumptions and approximations. The rest of the post talks about how the problem of creating itineraries is solved by using the discoveries of some great scientists.

In 1736, Leonhard Euler,  studied the following question: is it possible to walk through the city crossing each bridge exactly once?

—-add pic

As it turns out, for the city of Königsberg, the answer is no

Euler noticed that if all the nodes in the graph have an even number of edges (such graphs are called “Eulerian” in his honour) then, and only then, a cycle can be found that visits every edge exactly once. Keep this in mind, as we’ll rely on this fact later!

One of the approximation algorithms for TSP is an algorithm by Christofides. The assumption is the distances form a metric space (they are symmetric and obey the triangle inequality). It is an approximation algorithm that guarantees that its solutions will be within a factor of 3/2 of the optimal solution length.

This is an important part of the solution and builds on Euler’s paper.  here is a quick four-step rundown of how it works here:

  1. We start with all our destinations separate and repeatedly connect together the closest two that aren’t yet connected. This doesn’t yet give us an itinerary, but it does connect all the destinations via a minimum spanning tree of the graph.
  2. We take all the destinations that have an odd number of connections in this tree (Euler proved there must be an even number of these), and carefully pair them up.
  3. Because all the destinations now have an even number of edges, we’ve created a Eulerian graph, so we create a route that crosses each edge exactly once.
  4. We now have a great route, but it might visit some places more than once. No problem, we find any double visits and simply bypass them, going directly from the predecessor to the successor.

This is how a couple of algorithms and their discoveries are cleverly used to get a solution for their problem statement. I hope this post inspires you and motivates you to start looking into algorithms in a different perspective to solve real world problems.

And I’ll see you in the next one.

Sources: https://research.googleblog.com/2016/09/the-280-year-old-algorithm-inside.html

 

 

 
.

Am; YourMove: Visualize Sorting a Stack !

This blog summarization of the video from a YouTube channel AM; YourMove. This is the link to the video.

In this video, the presenter discussed how to sort a stack by just using basic stack operations like push, pop or peek etc. Other constraints on the problem are we can only use data structure stack. But it gives us the flexibility that we can use more than one stack (if we need).

It is one of the simple interview questions out there but the presenter made it more interesting by using a visualizing tool called processing IDE to show the viewers how the elements in the stack are pushed and popped. The video is attached at end of the post.

The presenter informs that the algorithm’s complexity is O(n2) which by no means sounds optimal but under these circumstances this best we can get. The algorithm is discussed below.

We need 2 stacks. The idea is to pull an item from the original stack and push it on the other stack. If pushing this item would violate the sort order of the new stack, we need to remove enough items from it so that it’s possible to push the new item. Since the items we removed are on the original stack, we’re back where we started.

code

Link to the visualization sample video (CLICK HERE) . You can check out the code HERE.

This was one of the interesting topics yet on AM; YourMove. Check out the channel here SUBSCRIBE NOW!

Am; YourMove : What is Load Balancer and Consistent Hashing #1

link to the actual video

On 10th April 2017 a new video is released on the channel AM;YourMove.

In this video, the presenter took up an interesting topic. The topic for this tech talk is “ load balancing and consistent hashing ”. The presenter started off the video with a question. The question was how large companies like Google, Amazon or Facebook are managing to process a huge amount of requests !? To give you a perspective on how large this information is, Google search alone gets around 3.2 billion requests in a single day. Just imagine how many requests does Youtube get? (Largest video streaming platform in the world) or Facebook!

So, The presenter then explains that this is possible because the load is balanced across multiple data centers situated at multiple places. That is why we are often hearing that Facebook is installing a new data center. Let me explain what is a data center. A data center is a large group of networked servers used for remote storage, processing, or distribution of large amounts of data. Then the presenter explains why there is a need for setting up a new data center. The obvious reason is the data center is getting overloaded or if not overload, it could be the efficiency of that data center is decreased. For many reasons, a data center can only process a certain n number of requests at an instant of time.

So we know why there is a requirement for a new data center. Multiple data centers should be well in sync so that none gets overloaded. The presenter then answers that this distribution of requests across multiple data centers is done by Load Balancer. A load balancer is a software or hardware or both.

 

load-balancing-1

 

Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Any addition or removal of these resources should not affect the user experience. This is usually achieved with a shared database or an in-memory session database, for example, Memcached. So basically, a load balancer distributes clients uniformly across multiple servers such that none get overloaded.Further, it is desirable to find an allocation that does not change very much over time in a dynamic environment in which both clients and servers can be added or removed at any time. In other words, we need the allocation of clients to servers to be consistent over time.

But how this allocation of clients is possible? certainly, some algorithm is used, right ? The algorithm used is called consistent hashing.

But let us first understand what is hashing. Hashing is a technique to store the data. Data can be accessed faster through directly by index in the location which can be computed by a hash function.

static_hash

Hashing concept is used in some famous databases like Amazon’s dynamo. The hash function is used to index the original value or key and then used later each time the data associated with the value or key is to be retrieved. Thus, hashing is always a one-way operation. There’s no need to “reverse engineer” the hash function by analyzing the hashed values. In fact, the ideal hash function can’t be derived by such analysis. A good hash function also should not produce the same hash value from two different inputs. If it does, this is known as a collision. A hash function that offers an extremely low risk of collision may be considered acceptable.

The presenter then asks the viewers what could be the difference between hashing and consistent hashing. First, let’s understand why we need different algorithms. As the systems which we are dealing with are dynamic i.e that server or client can go offline at any point of time. So if any server goes down all the clients allocated to it should be re-allocated to other servers to provide a user-friendly experience.

Consider this, If you have a collection of n cache machines then a common way of load balancing across them is to put object o in cache machine number hash(o) mod n. This works well until you add or remove cache machines (for whatever reason), for then n changes and every object is hashed to a new location. This can be catastrophic since the originating content servers are swamped with requests from the cache machines. It’s as if the cache suddenly disappeared. It would be nice if, when a cache machine was added, it took its fair share of objects from all the other cache machines. Equally, when a cache machine was removed, it would be nice if its objects were shared among the remaining machines. This is exactly what consistent hashing does – consistently maps objects to the same cache machine, as far as is possible, at least.

Seems too technical or difficult? The presenter promises to make the concepts simple enough to be understood by a non-technical person. The presenter then informs that this is just part 1 of this video. In the next video, The actual algorithm of load balancing and how consistent hashing works will be explained. Stay tuned to the channel AM;YourMove.

If you haven’t already subscribed to the channel subscribe NOW!  It is a great channel to get the latest information related to computer science.

Subscribe here -> CLICK HERE!

AM; YourMove: Check if a Tree is a Subtree or not!

link to the actual video

This is the very first blog post of the video description of the of the youtube video released on 3rd April 2017.

The topic for the video is checking if a binary tree is a subtree or not. This video is released as part of video series on trees data structure. The video starts off with describing the problem statement. The definition of a subtree can be sometimes misunderstood therefore the presenter takes the help of Wikipedia to find out the actual definition of a subtree. According to Wikipedia,

    A subtree of a tree T is a tree consisting of a node in T and all of its descendants in 

Afterwards, The presenter right away starts with finding the possible solution for the problem. To solve the problem, The presenter coins concepts like tree traversals. Tree traversals are the algorithms which traverse the tree is some fashion to get the values of the node data. There are 3 very popular tree traversals.

  1. Pre-order
  2. Post-order
  3. In-order

The presenter goes on to explain these traversals.

Pre-Order :

  1. Check if the current node is empty/null.
  2. Display the data part of the root (or current node).
  3. Traverse the left subtree by recursively calling the pre-order function.
  4. Traverse the right subtree by recursively calling the pre-order function.

In-Order :

  1. Check if the current node is empty/null.
  2. Traverse the left subtree by recursively calling the in-order function.
  3. Display the data part of the root (or current node).
  4. Traverse the right subtree by recursively calling the in-order function.

Post-Order :

  1. Check if the current node is empty/null.
  2. Traverse the left subtree by recursively calling the post-order function.
  3. Traverse the right subtree by recursively calling the post-order function.
  4. Display the data part of the root (or current node).

 

After that, Presenter reveals that traversals define few interesting characteristics of a binary tree.

For example.,

  1. If the binary tree is BST then inorder traversal gives us ascending order of the values of the nodes.
  2. For any binary tree, pre/post and inorder uniquely identifies a tree.

2nd point is interesting and can be directly applied to solve our problem i.e to find whether a tree is a subtree of another tree or not. But how? The presenter explains that if a smaller tree’s pre and inorder elements are subarray of bigger tree’s pre and inorder elements respectively then we can confirm that tree is a subtree. The presenter starts implementing the actual algorithm.

 

// method to find if a tree is a subtree

bool BinaryTree::isSubtree(node *big, node *small) {

// data structures to collect the values of the nodes encounterd while traversing the tree

vector<int> preOrderBigCollector;
vector<int> inOrderBigCollector;
vector<int> preOrderSmallCollector;
vector<int> inOrderSmallCollector;

preOrderTraversal(big, preOrderBigCollector);
preOrderTraversal(small, preOrderSmallCollector);

inOrderTraversal(big, inOrderBigCollector);
inOrderTraversal(small, inOrderSmallCollector);

return isSubArray(preOrderBigCollector, preOrderSmallCollector) && isSubArray(inOrderBigCollector, inOrderSmallCollector);

}

 

This is one the efficient solution for this problem. The worst case time complexity for this problem is O(n). But in the video presenter has used a vague approach to find is subarray. Due to that the complexity worsens to O(n2). Therefore, If we use KMP pattern matching algorithm for the same then the ideal time complexity can be achieved.

author’s note:

This is my first post. As you have realized I have I used 3rd person style of writing. I can sometimes criticize myself from what I have done in the video ( of course, for the positive personal growth ). Hope you liked this post!

LIKE + SHARE + SUBSCRIBE to AM;YourMove

https://www.youtube.com/channel/UCkAou6vl72JU7DMFYs84YYg