SAP HANA Sentiment Analysis

Post by Chandan Kalita, Sep 08, 2016.
Tweet   Share   Share

In this Exercise, we will find out how frequently the word “Fiori” is used and all the words related to it. We also find the sentiments related to the Category “Fiori”.
For this purpose we are going to use the Twitter API to get the tweets on the Category “Fiori”, save the tweets into SAP HANA system using JDBC connection and run the Sentiment Analysis on top of the tweets. After the exercise, you will be able to learn:
- SAP HANA integration with Twitter
- SAP HANA Sentiment Analysis

Prerequisite:

Register an Application at Twitter Developers

As we are going to use the Twitter API to extract the data from Twitter, it is required to create an application at Twitter Developer and we will need the authentication information of the application and use them to invoke the APIs later. In case you haven’t used Twitter before, you need to create your twitter account firstly. You can register an application and create your oAuth Tokens at Twitter

Click the button “Create New app”

create new app

Follow the form instructions to complete the registration. You need to input the application name, description, your websites and leave the call back URL as blank.

create new app

Accept the Developer Agreement and click on “Create your Twitter Application”
Now, your application is created
- Go to “Keys and Access Tokens” tab and you will see the button “Create my access token”, click it to generate the token. key and access - After that, you will be able to see the OAuth settings like below, save the values of Consumer Key, Consumer secret key, Access token and secret Access token. We need to use them later in the APIs. key and access

Download Twitter API Java library – Twitter4J:

Twitter4J is an unofficial open source Java library for the Twitter API. With Twitter4J, you can easily integrate your Java application with the Twitter services

Download here key and access

Extracting the downloaded zip file, go the sub folder lib and you will see the file twitter4j-core-4.0.3.jar, which is the library we need in the Java project and it must be added as the library or class path in the java runtime.
There are some useful examples and you can simply check them to help yourselves getting familiar with the Twitter APIs.
Prepare the HANA jdbc library: In order to access SAP HANA from java, we will need the jdbc library, which you can find at C:\Program Files\SAP\hdbclient\ngdbc.jar in windows and /usr/sap/hdbclient/ngdbc.jar in Linux by the default installation.

Note:

You have to download and include the twitter4j.jar. After the download, import the twitter4j-core-4.0.3 jar same as the ngdbc.jar file.

Exercise:

Now it is ready to go, in the end of the exercise, we will understand the source code of the project and know how to connect to HANA from java, how to use the twitter services in java and the most impressive thing is how simple it is to run the sentiment analysis in HANA, which combines the unstructured data from various sources like twitter, documents with the structured data in RDBMS.

Import the Java Project in Eclipse:

To save your time, we will ask you to import the existing java project instead of starting from scratch. Do not worry; we will explain all the components of the project in detail below.
Extract the project file Code.TextAnalysis.Standard.Rev90.zip at your local folder, the file should have been sent to you before the session. Open your HANA Studio and follow the steps below:
1. In the File menu, choose Import
2. Select the import source General > Existing Projects into Workspace and choose Next. You should have created the workspace in the XS exercise. Otherwise, you may need to have your workspace created first.
3. Select the root directory where your project files are located, select the project Code.TextAnalysis.Standard.Rev90 and click Finish to complete the import
Extracting the Project File 4. Now you will be able to see the project with the structures like this. Project Structure

Understand the Java Project:

The following table lists the major files in the project and we will explain them in detail later in the exercise.

Project Structure

Ngdbc jar import:

In order to connect to HANA system we require the ngdbc.jar file.
1.Right click on your project, choose Build Path-> Configure Build Path. Extracting the Project File
2.Go to the Libraries tab, click on Add External JARs Extracting the Project File
3.Choose the ngdbc.jar from the hdbclient. Extracting the Project File
4.Similarly you can import the twitter4j-core-4.0.2.jar from the path where you have saved the jar file.
Make sure you choose the jar in twitter4j-4.0.2->lib folder
Extracting the Project File
5.Click ok.

Note:

Copy the ngdbc.jar and twitter4j-core-4.0.2.jar to the lib folder also.

Create a column table in HANA:

Firstly, we need to create a table in HANA, where we will store the tweets we fetched from the twitter services.
1. Open HANA Studio, copy the SQL statement from the CreateTable.sql and execute it in the SQL Console. You need to replace the schema_name with your own schema.
Extracting the Project File 2.Expand the Catalog folder in HANA studio, you should find the table TWEETS in your schema and the definition of the table is like:
Extracting the Project File

Update the configurations:

In the purpose to maintain the configurations easily, we put all the required information in a single interface and it is mandatory for you to update it with your own account or settings before you can connect to either HANA or twitter
1. Open the file Configurations.java in your project. Basically, there are 4 categories of settings you can overwrite:
- Network Proxy Settings: The proxy host and port, set the HAS_PROXY as false if you do not need to use proxy.
- HANA Connection Settings: Replace the HANA URL with your own HANA host and port, username, password and the schema where you created your table.
- Twitter Authentication Settings: Replace with your own authentication information from your twitter application as described in the prerequisites
- Search Term: We will search the twitter based on the search term “Fiori” so we will know what people were talking around the product category “Fiori” in twitter. You can always replace it with your own term if you are interested in other topics.
Extracting the Project File
Test Connection to Twitter: Once you have the twitter authentication maintained correctly in the previous step. You can open TwitterConnection.java and execute it. You will see the message “Connection to Twitter Successfully!” following with your twitter user id in the console as in the screenshot shown below.
Extracting the Project File
Test Connection to SAP HANA:
Now let us open the file HDBConnection.java and execute it. You will see the message “Connection to HANA Successful!” in the console as in the screenshot shown below. Check the Configurations.java if you encountering any issue.
Extracting the Project File
The data access object TweetDAO is the single point to communicate with HANA from java, take a look at how the source code looks like and you will know the SQL statement and how to use the jdbc library.
Extracting the Project File

Invoke Twitter API and save the tweets into HANA:

Now it’s time to the do the real stuff. Open the file SearchTweets.java and execute it, which will search the tweets based on the search term we specified in the Configurations.java and everything we get will be saved to HANA table (TWEETS). You will see the messages in the console indicating that the tweets have been inserted to HANA successfully like in the screenshot below:

Extracting the Project File
After that, you can do the data preview in HANA studio and see the contents of the table TWEETS in your schema like this:

Extracting the Project File

Run text analysis in HANA:

Now we already have the tweets stored in the HANA table. In the next step, we will run the text analysis to see what people are talking around the term “Fiori” in twitter.
To run the text analysis, the only thing we need to do is create a full text index for that column of the table we want to do the analysis on and HANA will process the linguistic analysis, entity extraction, stemming, sentiment analysis for us and save the results in a generated table $TA_YOUR_INDEX_NAME in the same schema. After that, you can build views on top of the table and leverage all existing analysis tools around HANA to do the visualization even the predictive analysis.
1. Copy the SQL statement from the CreateFullTextIndex.sql and execute it in SQL console:

Extracting the Project File

NOTE:


When we do the text analysis in HANA, sentiment analysis is done automatically provided we use the “EXTRACTION_CORE_VOICEOFCUSTOMER”.
Do you believe the text analysis is already done by HANA? Yes, it is. Now you know how simple it is! You will be able to find a generated table $TA_TWEETS_FTI and $TA_TWEETS_HASHTAGS_FTI in your schema. The structure of the tables looks like this.
Extracting the Project File
Extracting the Project File
And here is the data preview of the $TA_TWEETS_FTI table, you will see the Tokens extracted from the tweets and the entity type of each token.
The row highlighted in red shows the usage of sentiment analysis in the below snapshot.
Extracting the Project File
Switch to the Analysis tab, and use SAP HANA Studio’s embedded visualization tools to render the data for text analysis:
Extracting the Project File

You can go through the below links for further references:
- http://www.saphana.com/community/hana-academy/#Text_Analysis
- http://help.sap.com/hana/SAP_HANA_Text_Analysis_Extraction_Customization_Guide_en.pdf
- http://help.sap.com/hana/SAP_HANA_Text_Analysis_Language_Reference_Guide_en.pdf



Chandan Kalita
Developer, Logic Scale

Chandan Kalita is an ambitious Software Developer for SAP Applications from India. He is specialized in SAPUI5 and custom tailored SAP FIORI. In addition to his everyday job he loves any kind of technical challenge – be it Hard- or Software related – and also devotes his free time with upcoming and trendy IT topics.