Beginning ML: Movie Review Sentiment Analyser cont. :KnowDev

In last post we have used pandas to extract raw data from .csv files and used bag of words model to pre process our data into feature sets.

In this post we will train the model. It’s most simplest thing. We will use RandomForests to predict. Random forest is a collection of decision trees.

First we initialize forest with 100 decision trees.

forest = RandomForestClassifier(n_estimators=100)

We will use fit function in forest variable to build a forest of trees from training set.

forest = forest.fit(train_cleaned_data, train[‘sentiment’])

trained_cleaned_data is the pre processed data from our last post. train[‘sentiment’] is the labels for all the data corresponding to X. And we are done with training our model.

Now, we can test and predict using our model.

To test we have first transform the test raw data into required format. We will use transform while testing because to avoid over-fitting.

    test_data_features = vectorizer.transform(clean_test_reviews)

Then we will simply predict using predict function of forest variable.

    result = forest.predict(test_data_features)

We will finish off our testing by simply loading all the predictions to a file for permanent storage. And that’s it we have used a new model and a new technique to build a sentiment analyzer. This model is not a perfect one for commercial use because one, we did not use a large dataset and also we did not use a more sophisticated model. In up coming posts we  will see what are those “sophisticated” techniques or models. I’m sure those concepts will be much more interesting, with that I’ll see you soon!

Complete source code here

Advertisements

Beginning ML – Movie Review Analysis: KnowDev

Till the last post we have seen methods of building a sentiment analyzer using multi-layer feedforward neural network. In fact in this post also we will build sentiment analyzer which can predict positiveness or negativeness of a movie review, We can consider this as one of the user case of what we learned so far.

This particular concept is divided into 2 parts. One, Pre processing our data. Two, Using random forest technique to predict.

Pre-Processing :

We will use pandas module to extract data from a csv file. As we did before we will use bag of words model to create feature sets. But before we have to clear little dirt like html tags (using beautifulsoup module), removing punctuations, and removing stopwords . StopWords  are the words like the, and, an, is etc which do not add any specific emotion to the sentence. We are removing punctuations as well to just remove the complexity, once we get quite familiar with what we are doing we add more complexities to our model. We will implement all this functionality in function clean_text.

Now we have to apply these modifications to all the reviews in our file. We call that function as create_clean_train. This function might take couple of minutes because there are almost 25000 reviews all together.

We will create feature sets using CountVectorizer from scikit learn.

In next, we will complete building our movie review sentiment analyser. See you next!

Complete source code: here

Beginning ML – Sentiment Analysis Using Neural Network cont. : KnowDev

This post is a continuation from this .

I hope you have got a good understanding why we have to pre-process. In this post we shall train our model and also input our own sentences.

First of all we shall get our feature sets that we have created either from pickle or call the function to store into a variable.

from create_sentiment_featuresets import create_feature_sets_and_labels
train_x, train_y, test_x, test_y = create_feature_sets_and_labels('pos.txt', 'neg.txt')

We will be using the same neural network model that we used here. First we have define our placeholder for features.

x = tf.placeholder('float', [None, len(train_x[0])])
y = tf.placeholder('float')

len( train_x[0] ) returns the length the features.

The neural network model is define using neural_network_model function. After the neural network is defined it’s time to train our model.

First we’ll capture the prediction / output of neural network using

prediction = neural_network_model(x)

Then, we have to find the cross entropy of the prediction made by our model. We are using softmax regresstion.

#1
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                                                      prediction, y))

After finding the cross entropy is time to back propagate and try reduce the difference.

#2
optimizer = tf.train.AdadeltaOptimizer(0.5).minimize(cross_entropy)

both #1 and #2 makes the training step. We’ll start session and using number of epochs as 10.

The accuracy we could achieve was 55.44

sentiment-54-accuracy

The trained model is saved into ‘sentiment_model.ckpt’, later we can use that to restore our variables ( i.e weights and biases ) to use.

Making predictions :

To make predictions using our model that we have just trained we have to preprocess our input sentence so that can be passed as features to our model. After we prepocess our input sentence we predict.

result = (sess.run(tf.argmax(prediction.eval(
feed_dict={x: [features[:423]]}), 1)))

we print out whether the output is positive or negative using

if result[0] == 0:
    print('Positive:', input_data)
elif result[0] == 1:
    print('Negative:', input_data

sentiment_output

As you can see our model makes pretty good prediction even though the accuracy is 54% .

In this post we have seen how we can train our own data as well as use it. In less than a week time we are able to make a machine which can predict the sentiment of any sentence pretty interesting right ? In next post I will introduce you to more sophisticated version of sentiment analysis. See you in next !

link to complete source code :  here

Beginning ML – First Neural Net : KnowDev

We have gone through some of the important topics in tensorflow and believe me there are ton of others ! but no worries.. We’ll catch up ! I always believed in project based learning. Therefore we’ll do the same this time as well. We shall be building a feed forward deep neural network which can classify handwritten digits. Sounds interesting right ? Let’s get into it. Open up any text editor or IDE. I personally prefer coding in an pycharm IDE.  It’s a wonderful piece of software to write your python scripts.So, What is the most critical part of any neural network ? Data ! right ! Neural networks shine when there is lots and lots of data to train it. We will be using mnist dataset provided in tensorflow.org tutorials.We can get the data by simply importing and loading into python variable.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

In MNIST, Every data point contains 2 points 1) image 2) label. Every image is of size 28 x 28.

Let’s start building our graph by creating a placeholder variable.

x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32) # for labels

xisn’t a specific value, but we’ll input when we ask tensorflow to run computations. Here x represents a MNIST image, flattened into 784-dimentional vector. We represent this as 2-D tensor of floating point.

Now we need variables for weights and biases. These we represent with tensorflow variable as these variables can be modified by operational computations.

W = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.random_normal([10]))

Notice that W, b are initialized with some random values but it doesn’t actually matter and w is tensor of shape [784, 10] because we want to generate classification for 10 classes i.e 0,1,2..9.  As we are building a deep net we need to have one or more hidden layers.

hidden_1_layer = { ‘weights’: tf.Variable(tf.random_normal([784, n_nodes_hl1])),
‘biases’: tf.Variable(tf.random_normal([n_nodes_hl1])) }

n_nodeshl1 is declared as 500. It is number of nodes in single hidden layer. We can tweak these numbers to check the change in accuracy. Moving on..

Now we can completed the neural network model by completing the implementation of layers.

l1 = tf.add(tf.matmul(data, hidden_1_layer[‘weights’]), hidden_1_layer[‘biases’])
l1 = tf.nn.relu(l1)  #activaion func

There few more steps before we start training our model i.e Actually defining classification algorithm and employ a optimizer for back propagation. We are using softmax_cross_entropy_with_logits  function.

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, y))

#comparing the diff with predicted vs orginal 

optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(cost)

There are many kinds of optimizers available in tensorflow, each one is optimal for some particular use-case. 0.5 mentioned is the learning rate of the neural network.

Let’s train our neural network. Each cycle of feed forward and back propagation is called an epoch. I have set number of epoch as 5 and also 10. we have to start a session and initialize all variables. We can run our both optimizer and cost epoch number of times here it is called as training step.

_, c = sess.run( [optimizer, cost], feed_dict = { x:epoch_x, y:epoch_y } )

We are placing the values of both x and y in batches. c is the variable which holds the value of epoch loss in each epoch.

Its time to test the neural network and check the accuracy of our model.

correct = tf.equal(tf.argmax(prediction,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct, ‘float’))

print(‘Accuracy: ‘, accuracy.eval( { x:mnist.test.images, y:mnist.test.labels }) )

The accuracy I could achieve was 97.999.

This implementation of simple deep neural network. This code is inspired from pythonprogramming.net and also tensorflow.org .

complete source code : https://github.com/makaravind/ImageClassifier

We shall be using this model for more couple of use-cases before we move on to another model. I’ll catch you up in next post.

Beginning ML – Variables : KnowDev

In previous post we have seen how tensorflow works. Now let’s understand one of the integral component i.e tensorflow variable. A tensorflow variables are used to represent the weights in neural network !

# create a tensorflow variable

var = tf.Variable(0)

cons = tf.constant(1)

We shall update the value of the variable var and there by understanding the working of tensorflow variable.

sum = tf.add(var, cons)

assign =tf.assign(var, sum)

The construction phase is completed. Let’s start the session.

with tf.Sesstion() as sess:

sess.run(tf.initialize_all_variables())

print ‘var:’, sess.run(var), ‘cons:’, sess.run(cons)

for _ in range(3):

sess.run(assign)

print ‘updated var:’, sess.run(var)

All variables are initialised in the graph by running sess.run(tf.initialise_all_varialbles).

output :

var: 0 cons: 1

1

2

3

This is a small post and very specific. As tensorflow variables are very important I wanted to give a clear explanation of their working. Now, We can actually start building our own neural network ! We will using mnist dataset provided by tensorflow.org. The data is the format that is accepted by tensorflow. Therefore we need not to preprocess any data ! We shall train and test our first neural network in next !

 

Beginning ML – Intro & Installation: KnowDev

It’s been a long since I’ve posted. But I’m back with the most interesting and trending concept i.e machine learning. I have just started learning ML and I want to share my whole experience. Hence I will be posting regularly all the new things I’m learning and various sources etc.  Tune in because as I’m a noob to ML my experience and resources I’m posting here will be a great use for someone who wanna start learning ML.

I’m using tensorflow on python2. Tensorflow is library maintained by Google. It contains various utility functions which can perform complex calculations. After all ML is result of complex math models and computations. In tensorflow, we can model a neural network and train it and also test with great ease. So let’s get into it but before we actually start working we need to install tensorflow library !  Tensorflow can very easily installed on Linux or Mac Os. We can work with tensorflow on windows only through virtual machine or docker. I’m using ubuntu hence tensorflow can be installed just like any other library in python i.e using pip.

Following are the commands to install tensorflow. Note that I’m using python2.

#ubuntu/Linux 64-bit 

$ sudo apt-get install python-pip python-dev

Above is just for installing pip if not already present.

Tensorflow is available for both CPU and GPU versions. As of now I’m using CPU version.

# Ubuntu/Linux 64-bit, CPU only, Python 2.7

$exportTF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl

Tensorflow installation will be completed with the following command, And you can test it by just importing tensorflow and running the python file. If you don’t get any errors you are good to go.

sudo pip install –upgrade $TF_BINARY_URL

ChatBot: KnowDev

Chat bots are big thing nowadays ! Every other big or small company wants to create its own chatbot. Chat bots got their craze because of the inception of modern methods in machine learning. There are many ways one can create their own chat bots. Some are very simple and straightforward and some are bit tough to understand.

As chat bots are becoming so popular even I wanted to make my hands dirty making one for myself. There are many platforms on to which one can launch their chatbots. I have choosen facebook messenger mainly because of it’s popularity. Then I came across Api.ai, It is a platform powered by machine learning mainly to create conversation apps. I wanted to start simple by creating a simple chat bot without any actual logic behind the scenes. Hence I thought of an app called “KnowDev“. The main aim of the the project was to create a chatbot which is capable of answering all the questions related to the developer of the facebook page. But to make things more interesting I wanted to compete this project in a day, technically <12hrs.

Following is the log of all the events from start to deployment of this chatbot.

11:30 a.m – Start

I had no knowledge about what and how should I proceed to make this project. All I had was a idea about resources. I have started reading through the documentation. Learnt some vocabulary and started building an test app along the way.

2:30 p.m – break

I almost got most of the basics done and I have to say the documentation is really good that any novice programmer can understand. I was implementing all the topics I was reading in the documentation in the test app.

3:55 p.m – project start

I have started the project by first writing down major conversation with my char bot. After getting a good idea of how should I proceed with my chat bot I’ve started the actual development. I have experimented few things, I have made few mistakes during the process. Finally my chat bot looked pretty decent and could able to answer about few topics right. Then I moved to next phase of my development i.e testing.

I made my sister test my chat bot and found out few loop holes. I have fixed those and chat bot was working fine for those particular topics. My next phase was deployment.

I have deployed my chat bot using heroku. The process of deployment was really easy and was not at all tough (almost ! :P). I have tested chatbot directly on my facebook messenger and was working like angel. But later I have realised that the app was not public and I had to submit this app for review to facebook. I have created privacy policy using www.iubenda.com. After adding the link to privacy policy in facebook developer dashboard I have submitted my app for review.

10:46 p.m-  Done !

Submitted app for review. I have to wait for another 5 days for review process to complete. As I had time before my 12 hrs could complete, I have decided to write down this blog in time.

It was an awesome productive day ! Next I will be looking into using this API.ai into my backend services and generate dynaic responses. I haven’t decided on what platform I should be developing my next chat bot but it would be one amongst android or python.

knowdev-api-ai

my work space after 12hrs of project

Happiness inspires productivity