Review Analytics - ThanoSQL

Introduction

Understanding customer sentiment through product reviews is crucial for businesses aiming to enhance their product offerings and customer satisfaction. This use case will guide you through the process of analyzing product reviews, classifying them into various emotional categories using natural language processing (NLP), and linking these reviews with sales data. By the end of this use case, you will be able to uncover insights about how different emotional categories of reviews correlate with product sales, helping you make data-driven decisions to improve your products and marketing strategies.

This tutorial can be executed both within ThanoSQL Lab and in a local Python/Jupyter environment. Whether you prefer to work directly within ThanoSQL Lab’s integrated environment or set up a local development environment on your machine, the instructions provided will guide you through the necessary steps.

If you want to try running the code from this use case, you can download the complete Jupyter notebook using this link: Download Jupyter Notebook. Alternatively, you can download it directly to your machine using the wget command below:

wget https://raw.githubusercontent.com/smartmind-team/mintlify-docs/main/examples/Review_Analytics.ipynb

To run the models in this tutorial, you will need the following tokens:

OpenAI Token: Required to access all the OpenAI-related tasks when using OpenAI as an engine. This token enables the use of OpenAI’s language models for various natural language processing tasks.
Huggingface Token: Required only to access gated models such as Mistral on the Huggingface platform. Gated models are those that have restricted access due to licensing or usage policies, and a token is necessary to authenticate and use these models. For more information, check this Huggingface documentation. Make sure to have these tokens ready before proceeding with the tutorial to ensure a smooth and uninterrupted workflow.

Dataset

We will be working with the following datasets:

Review Comments Table (review_comments): Contains textual reviews of products.
- ProductId: Unique identifier for each product.
- UserId: Unique identifier for each user.
- Text: Text of the product review.
Review Comments Sentiment Table (review_comments_sentiment): Contains product reviews along with their sentiment classification.
- ProductId: Unique identifier for each product.
- UserId: Unique identifier for each user.
- Text: Text of the product review.
- Sentiment: Sentiment classification of the review.
Sales Data Table (review_sales): Contains sales data for each product.
- ProductId: Unique identifier for each product.
- Score: Sales score for the product.

Goals

Classify product reviews into emotional categories.
Link classified reviews with corresponding sales data.
Generate actionable insights based on the analysis of linked data.

Displaying ThanoSQL Query Results in Jupyter Notebooks

The check_result function is designed to handle and display the results of a database query executed via the ThanoSQL client. It ensures that any errors are reported, and successful query results are displayed in a user-friendly format. Note: This function is specifically designed to work in Jupyter notebook environments.

from IPython.display import display

def check_query_result(query_result):
    if query_result.error_result:
        print(query_result.error_result)
    else:
        if query_result.records is not None and len(query_result.records.data) > 0:
            df = query_result.records.to_df()
            display(df)
        else:
            print("Query executed successfully")

Procedure

Import ThanoSQL Library:
- Import the ThanoSQL library and create a client instance. This client will be used to interact with the ThanoSQL engine.
```
from thanosql import ThanoSQL
client = ThanoSQL(api_token="your_api_token", engine_url="engine_url")
```
Upload Data to Tables:
- Upload the review_sales table which contains sales data for each product.
```
table = client.table.upload("review_sales", "sales_data.csv", if_exists='replace')
table.get_records(limit=10).to_df()
```
On execution, we get:
- This step uploads the review_sales data to ThanoSQL and retrieves the first 10 records to confirm the upload.
- Upload the review_comments table which contains textual reviews of products.
```
table = client.table.upload("review_comments", "comments_data.csv", if_exists='replace')
table.get_records(limit=10).to_df()
```
On execution, we get:
- This step uploads the review_comments data to ThanoSQL and retrieves the first 10 records to confirm the upload.
- Upload the review_comments_sentiment table which contains product reviews along with their sentiment classification.
```
table = client.table.upload("review_comments_sentiment", "comments_data_sentiment.csv", if_exists='replace')
table.get_records(limit=10).to_df()
```
On execution, we get:
- This step uploads the review_comments_sentiment data to ThanoSQL and retrieves the first 10 records to confirm the upload.

Classify Reviews and Aggregate Sales Data:

Predict sentiment for reviews using a pre-trained model and aggregate the data to link reviews with sales.

Predict Sentiment for Reviews:

query_result = client.query.execute("""
    -- This query predicts the sentiment of product review comments using a specified model
    -- and retrieves the sentiment along with the product ID and text of the review.
    SELECT
        "ProductId",                   
        "Text",                        
        (
            jsonb_array_elements_text(
                (
                    -- Step 1: Predict sentiment for the review comment
                    thanosql.predict(
                        input := "Text",                      
                        model := 'bhadresh-savani/distilbert-base-uncased-emotion', 
                        model_args := '{"top_k": 1, "device_map": "cpu"}'      
                    )::jsonb                                 
                    -- Cast the result to JSONB
                )
            )::jsonb ->> 'label'                            
            -- Extract the 'label' field from the JSONB array
        ) AS "Sentiment"                                     
        -- Alias the result as 'Sentiment'
    FROM
        review_comments                                      
    LIMIT 10;         
""")
check_query_result(query_result)

On execution, we get: Predicted Sentiment

Link Reviews with Sales Data:

Group reviews by ProductId and Sentiment to show the count of each sentiment and the corresponding sales data for each product.

query_result = client.query.execute("""
    -- This query calculates the total quantity sold for each product and the count of each sentiment from reviews.
    -- It uses a model to predict the sentiment of review comments and joins this information with sales data.
    SELECT 
        s."ProductId",                             
        SUM(s."Score") AS total_quantity_sold,     
        r."Sentiment",                             
        COUNT(r."Sentiment") AS sentiment_count    
    FROM 
        review_sales s                             
    JOIN
        (
            -- Subquery to predict sentiment for review comments
            SELECT
                "ProductId",                       
                "Text",                            
                (
                    jsonb_array_elements_text(
                        (
                            thanosql.predict(
                                input := "Text",    
                                model := 'bhadresh-savani/distilbert-base-uncased-emotion', 
                                model_args := '{"top_k": 1, "device_map": "cpu"}'
                            )::jsonb                
                            -- Cast the result to JSONB
                        )
                    )::jsonb ->> 'label'            
                    -- Extract the 'label' field from the JSONB array
                ) AS "Sentiment"                  
            FROM
                review_comments                   
            LIMIT 100                             
        ) r
    ON 
        s."ProductId" = r."ProductId"             
            -- Join condition: match product IDs in sales and review comments
    GROUP BY 
        s."ProductId", r."Sentiment"                
        -- Group the results by product ID and sentiment
    ORDER BY 
        s."ProductId", sentiment_count DESC;        
        -- Order the results by product ID and sentiment count in descending order
    """)
check_query_result(query_result)

On execution, we get: Linked Reviews and Sales Data

Analyze Positive Reviews and Sales Performance:

Show the relationship between sales and positive reviews (joy, love, surprise, neutral) for each product.

query_result = client.query.execute("""
    -- This query identifies products with positive reviews and compares their sales performance against the average sales.
    WITH positive_reviews AS (
        -- Step 1: Calculate the count of positive reviews for each product
        SELECT 
            "ProductId",                              
            COUNT(*) AS positive_count                
        FROM 
            (
                -- Subquery to predict sentiment for review comments
                SELECT
                    "ProductId",                      
                    "Text",                           
                    (
                        jsonb_array_elements_text(
                            (
                                thanosql.predict(
                                    input := "Text", 
                                    model := 'bhadresh-savani/distilbert-base-uncased-emotion',
                                    model_args := '{"top_k": 1, "device_map": "cpu"}' 
                                )::jsonb             
                                    -- Cast the result to JSONB
                            )
                        )::jsonb ->> 'label'          
                            -- Extract the 'label' field from the JSONB array
                    ) AS "Sentiment"                  
                FROM
                    review_comments                   
                LIMIT 100                             
            ) AS subquery
        WHERE 
            "Sentiment" IN ('joy', 'love', 'surprise', 'neutral') 
                -- Filter for positive sentiments
        GROUP BY 
            "ProductId"                               
                -- Group the results by product ID
    ),
    total_sales AS (
        -- Step 2: Calculate the total quantity sold for each product
        SELECT 
            "ProductId",                              
            SUM("Score") AS total_quantity_sold       
        FROM 
            review_sales                              
        GROUP BY 
            "ProductId"                               
    )
    SELECT 
        ts."ProductId",                               
        ts.total_quantity_sold,                       
        COALESCE(pr.positive_count, 0) AS positive_count, 
            -- Count of positive reviews, 0 if no positive reviews
        CASE 
            WHEN pr.positive_count IS NULL THEN 'No Positive Reviews' 
                -- No positive reviews case
            WHEN ts.total_quantity_sold > (
                SELECT AVG(total_quantity_sold) 
                FROM total_sales
            ) THEN 'Above Average Sales'              
                -- Above average sales case
            ELSE 'Below Average Sales'                
                -- Below average sales case
        END AS sales_performance                      
            -- Categorized sales performance
    FROM 
        total_sales ts                                
    LEFT JOIN 
        positive_reviews pr                           
            -- Left join with positive reviews data
    ON 
        ts."ProductId" = pr."ProductId"               
            -- Join condition: match product IDs
    ORDER BY 
        ts.total_quantity_sold DESC;                  
            -- Order the results by total quantity sold in descending order
""")
check_query_result(query_result)

On execution, we get: Positive Reviews and Sales Performance

Conclusion

This use case has guided you through the process of using ThanoSQL to analyze product reviews and their impact on sales performance. By classifying reviews into emotional categories and linking them with sales data, you can uncover valuable insights that help in understanding customer sentiment and its correlation with product success. This analysis can inform your marketing strategies, product development, and customer service approaches, ultimately contributing to improved customer satisfaction and business growth.

Use Cases

​Introduction

​Dataset

​Goals

​Displaying ThanoSQL Query Results in Jupyter Notebooks

​Procedure

​Conclusion

Introduction

Dataset

Goals

Displaying ThanoSQL Query Results in Jupyter Notebooks

Procedure

Conclusion