Recent Updates Page 2 Toggle Comment Threads | Keyboard Shortcuts

  • maravindblog 12:07 pm on November 26, 2016 Permalink | Reply
    Tags: data analysis   

    Beginning ML: What next? 

    Machine learning is a branch of artificial intelligence and deals with lots and lots of data. In our neural network models we have used MNIST dataset which was pre processed and can be directly used by tensorflow. And we also have used a raw dataset that we had to pre process to make use for our model. Clearly one need to understand and analyse data ( In fact large amounts of data ) .

    Data Analysis is the branch that deals with data and helps us to understand data and use to draw conclusions. Therefore in machine learning data is critical part  and we have to learn to analyse data.

    Next posts we will look how we can use python to perform data analysis. I’m looking up few courses online to learn data analysis .

    Let’s catch up in next!

  • maravindblog 4:48 pm on November 23, 2016 Permalink | Reply
    Tags: , ,   

    Beginning ML: Movie Review Sentiment Analyser cont. :KnowDev 

    In last post we have used pandas to extract raw data from .csv files and used bag of words model to pre process our data into feature sets.

    In this post we will train the model. It’s most simplest thing. We will use RandomForests to predict. Random forest is a collection of decision trees.

    First we initialize forest with 100 decision trees.

    forest = RandomForestClassifier(n_estimators=100)

    We will use fit function in forest variable to build a forest of trees from training set.

    forest =, train[‘sentiment’])

    trained_cleaned_data is the pre processed data from our last post. train[‘sentiment’] is the labels for all the data corresponding to X. And we are done with training our model.

    Now, we can test and predict using our model.

    To test we have first transform the test raw data into required format. We will use transform while testing because to avoid over-fitting.

        test_data_features = vectorizer.transform(clean_test_reviews)

    Then we will simply predict using predict function of forest variable.

        result = forest.predict(test_data_features)

    We will finish off our testing by simply loading all the predictions to a file for permanent storage. And that’s it we have used a new model and a new technique to build a sentiment analyzer. This model is not a perfect one for commercial use because one, we did not use a large dataset and also we did not use a more sophisticated model. In up coming posts we  will see what are those “sophisticated” techniques or models. I’m sure those concepts will be much more interesting, with that I’ll see you soon!

    Complete source code here

  • maravindblog 3:57 pm on November 22, 2016 Permalink | Reply
    Tags: ,   

    Beginning ML – Movie Review Analysis: KnowDev 

    Till the last post we have seen methods of building a sentiment analyzer using multi-layer feedforward neural network. In fact in this post also we will build sentiment analyzer which can predict positiveness or negativeness of a movie review, We can consider this as one of the user case of what we learned so far.

    This particular concept is divided into 2 parts. One, Pre processing our data. Two, Using random forest technique to predict.

    Pre-Processing :

    We will use pandas module to extract data from a csv file. As we did before we will use bag of words model to create feature sets. But before we have to clear little dirt like html tags (using beautifulsoup module), removing punctuations, and removing stopwords . StopWords  are the words like the, and, an, is etc which do not add any specific emotion to the sentence. We are removing punctuations as well to just remove the complexity, once we get quite familiar with what we are doing we add more complexities to our model. We will implement all this functionality in function clean_text.

    Now we have to apply these modifications to all the reviews in our file. We call that function as create_clean_train. This function might take couple of minutes because there are almost 25000 reviews all together.

    We will create feature sets using CountVectorizer from scikit learn.

    In next, we will complete building our movie review sentiment analyser. See you next!

    Complete source code: here

  • maravindblog 2:44 pm on November 20, 2016 Permalink | Reply  

    Beginning ML: Sentiment Analysis Using Textblob : KnowDev 

    In the last post we have build a neural network for sentiment analysis. We have used our own dataset which was not pretty big enough. Indeed we were able to achieve accuracy of 54%. Today we shall be using a module of python for sentiment analysis. We shall be building twitter sentiment analyzer ! believe me you’ll be amazed by how easily we can achieve it !

    First we need to install 2 modules, tweepy, which allows us to make API calls to twitter. We have to create a app in twitter developer to actually authenticate ourselves. Next,  we need textblob which can perform sentiment analysis. Textblob can actually perform many more operations apart from sentiment analysis. If you are you can check out here.

    Let’s import our dependencies

    import tweepy
    from textblob import TextBlob

    We have to declare 4 variables, consumer_key, consumer_secret, access_token, access_token_secret all these can be found after we create app in twitter developer site.

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    We can authenticate ourselves by above 2 lines. We are almost done with the authentication.

    api = tweepy.API(auth)

    Through api variable we can use search operation to find public tweets.

    public_tweet ='search')

    search is the key word we will finding for. Now we can iterate through public_tweets and use textblob to perform sentiment analysis on the tweet.

    for tweet in public_tweet:
        T = tweet.text
        analysis = TextBlob(tweet.text)
        sentiment = analysis.sentiment.polarity
        print T, sentiment

    And that’s it ! We have successfully using tweepy and textblob modules to build a twitter sentiment analyzer in less than 25 lines. In fact there are many more sources from which we can use API.

    This is a relatively small post and you know why ! Now you can use sentiment analyzer for wide range of use cases and I’ll see you in next !

    Complete source code

  • maravindblog 4:39 pm on November 19, 2016 Permalink | Reply
    Tags: , ,   

    Beginning ML – Sentiment Analysis Using Neural Network cont. : KnowDev 

    This post is a continuation from this .

    I hope you have got a good understanding why we have to pre-process. In this post we shall train our model and also input our own sentences.

    First of all we shall get our feature sets that we have created either from pickle or call the function to store into a variable.

    from create_sentiment_featuresets import create_feature_sets_and_labels
    train_x, train_y, test_x, test_y = create_feature_sets_and_labels('pos.txt', 'neg.txt')

    We will be using the same neural network model that we used here. First we have define our placeholder for features.

    x = tf.placeholder('float', [None, len(train_x[0])])
    y = tf.placeholder('float')

    len( train_x[0] ) returns the length the features.

    The neural network model is define using neural_network_model function. After the neural network is defined it’s time to train our model.

    First we’ll capture the prediction / output of neural network using

    prediction = neural_network_model(x)

    Then, we have to find the cross entropy of the prediction made by our model. We are using softmax regresstion.

    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                                                          prediction, y))

    After finding the cross entropy is time to back propagate and try reduce the difference.

    optimizer = tf.train.AdadeltaOptimizer(0.5).minimize(cross_entropy)

    both #1 and #2 makes the training step. We’ll start session and using number of epochs as 10.

    The accuracy we could achieve was 55.44


    The trained model is saved into ‘sentiment_model.ckpt’, later we can use that to restore our variables ( i.e weights and biases ) to use.

    Making predictions :

    To make predictions using our model that we have just trained we have to preprocess our input sentence so that can be passed as features to our model. After we prepocess our input sentence we predict.

    result = (
    feed_dict={x: [features[:423]]}), 1)))

    we print out whether the output is positive or negative using

    if result[0] == 0:
        print('Positive:', input_data)
    elif result[0] == 1:
        print('Negative:', input_data


    As you can see our model makes pretty good prediction even though the accuracy is 54% .

    In this post we have seen how we can train our own data as well as use it. In less than a week time we are able to make a machine which can predict the sentiment of any sentence pretty interesting right ? In next post I will introduce you to more sophisticated version of sentiment analysis. See you in next !

    link to complete source code :  here

  • maravindblog 5:18 pm on November 18, 2016 Permalink | Reply  

    Beginning ML – Sentiment Analysis Using Deep Neural Network: KnowDev 

    In last post we have implemented our first neural network which can classify a set a images. In fact, That experiment can be considered as HELLO WORLD program. There is lot more we have consider while implementing our model. Mainly data ! lot of times data is very raw and we are required to perform some kind of preprocessing so that the data is in format that tensorflow objects can accept. Sentiment Analyser is a program which can tell whether the given sentence is positive or negative. We will be using the same neural network model that we have build in last post. For better understanding purpose the whole process of building of sentiment analyzer is divided into parts.

    In this post, we shall be looking on how to get raw data and convert into required format. Both positive dataset and negative dataset is available in GIT link. First we’ll download the datasets into our directory. Both datasets contains 5000 sentences each. Yes! the data we got is not really enough for practical purposes.

    Once we have got our datasets ready in our directory. Import tensorflow(duh!) we shall be creating feature sets from the data.

    First of all we have create our vocabulary of words.The model we will is bag of words. We will call this collection words as lexicon. We will be using nltk library to extract words which are most relevant. The technique we are using is stemming and lemmatizing.

    lexicon = [lemmatizer.lemmatize(i) for i in lexicon]

    lexicon in the LHS just contains all the words from pos.txt and neg.txt (our datasets). In fact we can  employ other techniques such as removing stop words (like the, an, a..) which have no particular effect on the sentiment of the sentence. We are kind of removing  those words by considering only words of frequency more than 1000.

    for w in w_counts:
        if 1000 > w_counts[w] > 50:
            l2.append(w) # l2- final lexicon list

    Now as if have created our vocabulary we can create out features. Here our lexicon size is 423. A tensor accepts a object of floats but the sentences we have in string. Hence we have to use our lexicon that we have created earlier to make a vector which contains the frequency of words in the sentence.

    for example, lexicon = [‘dog’, ‘cat’, ‘eat’, fight’, ‘food’] and the given sentence is ” dog fights with cat for food “. Therefore the feature set is [1, 1, 0, 1, 1].

    We create a list of list of features and classification. Positive is denoted as [1, 0] and negative as [0, 1]. 

    features = list(features)
    featureset.append([features, classification])

    Finally we’ll create our collection of featureset of both positive and negative. The list shuffled so that the neural network can converge.

    features += sample_handling('pos.txt', lexicon, [1, 0])
    features += sample_handling('neg.txt', lexicon, [0, 1])

    Now the whole set is divided training data and testing data.

    train_x = list(features[:, 0][:-testing_size])
    train_y = list(features[:, 1][:-testing_size])
    test_x = list(features[:, 0][-testing_size:])
    test_y = list(features[:, 1][-testing_size:])

    train_x and test_x are the features and train_y and test_y are the labels. We will be using pickle module for permanent storage of these values so that they can be used later for training our neural network.

    In this post we have downloaded our own data and cleaned to our requirements as well as dividing our cleaned data into training data and testing data.

    In next post we will be using this data to train our model and test to find accuracy and also run the model against our own inputs ! Awesome right ? I’m excited too…

    See you in next !

    link to complete source code :

    next post : next


  • maravindblog 4:54 pm on November 17, 2016 Permalink | Reply
    Tags: ,   

    Beginning ML – First Neural Net : KnowDev 

    We have gone through some of the important topics in tensorflow and believe me there are ton of others ! but no worries.. We’ll catch up ! I always believed in project based learning. Therefore we’ll do the same this time as well. We shall be building a feed forward deep neural network which can classify handwritten digits. Sounds interesting right ? Let’s get into it. Open up any text editor or IDE. I personally prefer coding in an pycharm IDE.  It’s a wonderful piece of software to write your python scripts.So, What is the most critical part of any neural network ? Data ! right ! Neural networks shine when there is lots and lots of data to train it. We will be using mnist dataset provided in tutorials.We can get the data by simply importing and loading into python variable.

    import tensorflow as tf
    from tensorflow.examples.tutorials.mnist import input_data
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

    In MNIST, Every data point contains 2 points 1) image 2) label. Every image is of size 28 x 28.

    Let’s start building our graph by creating a placeholder variable.

    x = tf.placeholder(tf.float32, [None, 784])
    y = tf.placeholder(tf.float32) # for labels

    xisn’t a specific value, but we’ll input when we ask tensorflow to run computations. Here x represents a MNIST image, flattened into 784-dimentional vector. We represent this as 2-D tensor of floating point.

    Now we need variables for weights and biases. These we represent with tensorflow variable as these variables can be modified by operational computations.

    W = tf.Variable(tf.random_normal([784, 10]))
    b = tf.Variable(tf.random_normal([10]))

    Notice that W, b are initialized with some random values but it doesn’t actually matter and w is tensor of shape [784, 10] because we want to generate classification for 10 classes i.e 0,1,2..9.  As we are building a deep net we need to have one or more hidden layers.

    hidden_1_layer = { ‘weights’: tf.Variable(tf.random_normal([784, n_nodes_hl1])),
    ‘biases’: tf.Variable(tf.random_normal([n_nodes_hl1])) }

    n_nodeshl1 is declared as 500. It is number of nodes in single hidden layer. We can tweak these numbers to check the change in accuracy. Moving on..

    Now we can completed the neural network model by completing the implementation of layers.

    l1 = tf.add(tf.matmul(data, hidden_1_layer[‘weights’]), hidden_1_layer[‘biases’])
    l1 = tf.nn.relu(l1)  #activaion func

    There few more steps before we start training our model i.e Actually defining classification algorithm and employ a optimizer for back propagation. We are using softmax_cross_entropy_with_logits  function.

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, y))

    #comparing the diff with predicted vs orginal 

    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(cost)

    There are many kinds of optimizers available in tensorflow, each one is optimal for some particular use-case. 0.5 mentioned is the learning rate of the neural network.

    Let’s train our neural network. Each cycle of feed forward and back propagation is called an epoch. I have set number of epoch as 5 and also 10. we have to start a session and initialize all variables. We can run our both optimizer and cost epoch number of times here it is called as training step.

    _, c = [optimizer, cost], feed_dict = { x:epoch_x, y:epoch_y } )

    We are placing the values of both x and y in batches. c is the variable which holds the value of epoch loss in each epoch.

    Its time to test the neural network and check the accuracy of our model.

    correct = tf.equal(tf.argmax(prediction,1), tf.argmax(y,1))
    accuracy = tf.reduce_mean(tf.cast(correct, ‘float’))

    print(‘Accuracy: ‘, accuracy.eval( { x:mnist.test.images, y:mnist.test.labels }) )

    The accuracy I could achieve was 97.999.

    This implementation of simple deep neural network. This code is inspired from and also .

    complete source code :

    We shall be using this model for more couple of use-cases before we move on to another model. I’ll catch you up in next post.

  • maravindblog 4:33 pm on November 16, 2016 Permalink | Reply
    Tags: ,   

    Beginning ML – Variables : KnowDev 

    In previous post we have seen how tensorflow works. Now let’s understand one of the integral component i.e tensorflow variable. A tensorflow variables are used to represent the weights in neural network !

    # create a tensorflow variable

    var = tf.Variable(0)

    cons = tf.constant(1)

    We shall update the value of the variable var and there by understanding the working of tensorflow variable.

    sum = tf.add(var, cons)

    assign =tf.assign(var, sum)

    The construction phase is completed. Let’s start the session.

    with tf.Sesstion() as sess:

    print ‘var:’,, ‘cons:’,

    for _ in range(3):

    print ‘updated var:’,

    All variables are initialised in the graph by running

    output :

    var: 0 cons: 1




    This is a small post and very specific. As tensorflow variables are very important I wanted to give a clear explanation of their working. Now, We can actually start building our own neural network ! We will using mnist dataset provided by The data is the format that is accepted by tensorflow. Therefore we need not to preprocess any data ! We shall train and test our first neural network in next !


  • maravindblog 4:04 pm on November 16, 2016 Permalink | Reply
    Tags: ,   

    Beginning ML – basics : KnowDev 

    Let’s get our hands dirty !

    First. Some basics

    First things first, Import tensorflow library

    import tensorflow as tf

    Let’s first understand how tensorflow works by taking 2 tensorflow constants.

    x1 = tf.constant(5)

    x2 = tf.constant(6)

    Multiply x1 and x2.

    result = tf.mul(x1, x2)

    Now, if you try to print the result and run the program you wouldn’t get any output because till now if have constructed the computational graph. To actually multiply the constants and get the result of the multiplication, you must launch the graph in the session.

    with tf.Session() as sess:

    out =

    print out

    The actual computaion takes place when is called ! The output can be seen in the terminal as :

    Tensor(“Mul:0”, shape=(), dtype=’int32′)


    As you can see the everything in tensorflow is represented as tensor. A tensor can be thought of as an multi-dimensional matrix. Each node in tensorflow computational graph is called ‘ops’. An op can contain zero or more tensors and tensors can only be passed for operations in the graph.

    I hope this post gives you a brief introduction of tensorflow and its core working in a nutshell. In coming posts we shall be looking into more complicated and yet interesting concepts.


  • maravindblog 12:23 pm on November 16, 2016 Permalink | Reply
    Tags: , tesorflow   

    Beginning ML – Intro & Installation: KnowDev 

    It’s been a long since I’ve posted. But I’m back with the most interesting and trending concept i.e machine learning. I have just started learning ML and I want to share my whole experience. Hence I will be posting regularly all the new things I’m learning and various sources etc.  Tune in because as I’m a noob to ML my experience and resources I’m posting here will be a great use for someone who wanna start learning ML.

    I’m using tensorflow on python2. Tensorflow is library maintained by Google. It contains various utility functions which can perform complex calculations. After all ML is result of complex math models and computations. In tensorflow, we can model a neural network and train it and also test with great ease. So let’s get into it but before we actually start working we need to install tensorflow library !  Tensorflow can very easily installed on Linux or Mac Os. We can work with tensorflow on windows only through virtual machine or docker. I’m using ubuntu hence tensorflow can be installed just like any other library in python i.e using pip.

    Following are the commands to install tensorflow. Note that I’m using python2.

    #ubuntu/Linux 64-bit 

    $ sudo apt-get install python-pip python-dev

    Above is just for installing pip if not already present.

    Tensorflow is available for both CPU and GPU versions. As of now I’m using CPU version.

    # Ubuntu/Linux 64-bit, CPU only, Python 2.7


    Tensorflow installation will be completed with the following command, And you can test it by just importing tensorflow and running the python file. If you don’t get any errors you are good to go.

    sudo pip install –upgrade $TF_BINARY_URL

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc