Welcome Guest, you are in: Login

QVSource: The QlikView & Qlik Sense API Connector (Docs)


In this article we show how to use QlikView and QVSource to create a sentiment prediction model for the Google Prediction Connector.

Google also give a number of other suggestions of how their prediction API could be used, including:
  • Document and Email Classification
  • Recommendation Systems
  • Spam Detection
  • Language Detection
  • Upsell Opportunity Analysis
  • Diagnostics
  • Suspicious Activity Detection
  • Churn Analysis
  • + Many More

Prerequisites

Ensure that you have set up your Google API project, OAuth 2.0 Client and Google Cloud Storage, as per these instructions, before attempting to use the Google Predictor Connector

Extracting Some Training Data

We need a set of input data to first train the prediction model. This can consist of any columns you choose and be used to predict any set of classifications you like.

Google give a few examples:

  • Given a user's past viewing habits, predict what other movies or products a user might like.
  • Categorize emails as spam or non-spam.
  • Analyze posted comments about your product to determine whether they have a positive or negative tone.
  • Guess how much a user might spend on a given day, given his spending history.

And in our example we are going to use a Tweet's text, hashtag count and url count to categorise the Tweet.

First we use the QVSource Twitter Connector to download a sample of Tweets which we will score manually. In this example we use the hashtag of a popular airline.

Image

We can now grab the load script and build the following simple QlikView application:


Tweets:
LOAD
	text as Search_text,
	hashtag_count as Search_hashtag_count,
	url_count as Search_url_count
FROM
[http://localhost:5555/QVSource/TwitterConnectorV2/?table=Search&appID=&searchTerm=%23klm&maxNoPages=1&lang=en]
(qvx);

store Search_text, Search_hashtag_count, Search_url_count from Tweets into training.csv (txt);

Manually Categorise The Training Data

Once the above script has run we will have a CSV file which we can open in Excel as follows:

Image

We then insert a new column named category - it is important that this is the first column - and manually classify all the Tweets.

Note that The first column represents the value (numeric or string) for that example. If the first column is numeric, this model is a regression model; if the first column is a string, it is a categorization model. More information about the format and structure of the training data can be found here.

Image

NOTE:
  • We can create any classifications we like. In this example we simply choose 'positive', 'negative' and 'neutral' (sentiment). You could however create different classifications for example 'complaint', 'noncomplaint'.
  • In a real production scenario you would want to classify a much larger number than 100 data points. You might also want to allow a number of people to classify them independently and then arrive at a final classification using some sort of voting system.
  • We have included the hashtag and url counts as inputs to the prediction engine (although we are not actually taking them into account when manually categorising the data) simply to illustrate that you can have a number of different text and numerical inputs used to train the prediction model.


Upload The Training Data

Once you have finalised your manually categorised data set:
  • Remove the header row in Excel (this is important as the Google training input expects the first line to be data).
  • Save the file as a CSV file somewhere on your computer.

You should now go to the Google Storage Account for the project you created to use the QlikView Google Predict Connector (you should find your projects listed here).

Browse to the bucket you wish to use (or create one) and click the 'Upload' button:

Image

Once you have done this, you should see your training data file available. In our example the path (bucket_name/file_name) would be: tweet-sentiment/training_manually_scored.csv

Training The Model

Next we open up the Google Prediction Connector and on the CreateModel table ensure that all the required parameters are filled out.

Note that the storage data location is the path identified above. We can choose a suitable model id ourselves. The other inputs should be configured as per the docs.

Once we have filled out the input parameters, run the table:

Image

We can now keep checking the CheckModelStatus table until we see that the model is DONE:

Image

At this point we have a prediction model tuned and trained to our input data. We can now use the model to categorise new, previously unseen data (in this case, Tweets).

Categorising New Data

Lets change our load script so we are loading Tweets from a different well known airline:



Tweets:
LOAD
	id as rowKey,
	text as Search_text,
	hashtag_count as Search_hashtag_count,
	url_count as Search_url_count
FROM
[http://localhost:5555/QVSource/TwitterConnectorV2/?table=Search&appID=&searchTerm=%23lufthansa&maxNoPages=1&lang=en]
(qvx);

store rowKey, Search_text, Search_hashtag_count, Search_url_count from Tweets into new_data.csv (txt);

When this script has run we will have a new text file named new_data.csv containing text, hashtag count and url count columns which the prediction engine requires to classify the Tweets.

Note that, we have also introduced a new rowKey column/parameter which is expected and required (as the first column) which the Connector uses to link the results back (see below).

If we now bring up the QVSource Google Prediction Connector for QlikView and look at the inputs for the Predict table. In addition to the expected inputs for the project and model numbers we can now enter a path to this CSV file and run the table.

In the background, QVSource will run each of the prediction requests on parallel threads in order to get the results as quickly as possible:

Image

We can now grab the load script and paste it into our QlikView application.


GooglePredictionConnector_Predict:
LOAD
	rowKey,
	status as Predict_status,
	label as Predict_label,
	score as Predict_score
FROM
[http://localhost:5555/QVSource/GooglePredictionConnector/?table=Predict&appID=&projectNumber={PROJECT_NUMBER}&modelId=tweet_sentiment_v1&csvFile=C%3a%5cfolder%5cnew_data.csv]
(qvx);

That's it - we can now reload our application to build the following data model in QlikView - one table containing a list of Tweets matching our search criteria the other containing the categories predicted by our custom trained Google Prediction model.

Image
  Name Size
- Check Status.png 50.40 KB
- Collect Tweets.png 81.79 KB
- Create Model.png 50.94 KB
- Data Model.png 5.62 KB
- Excel 1.png 57.10 KB
- Excel 2.png 15.86 KB
- Predict Table.png 99.08 KB
- upload file.png 16.09 KB


(QVSource works with Qlik Sense as well as QlikView - See this page for notes.)
QVSource - The QlikView & Qlik Sense API Connector | © Copyright 2011 - 2016 Industrial CodeBox Ltd