Introduction
Understanding customer sentiment through product reviews is crucial for businesses aiming to enhance their product offerings and customer satisfaction. This use case will guide you through the process of analyzing product reviews, classifying them into various emotional categories using natural language processing (NLP), and linking these reviews with sales data. By the end of this use case, you will be able to uncover insights about how different emotional categories of reviews correlate with product sales, helping you make data-driven decisions to improve your products and marketing strategies.
This tutorial can be executed both within ThanoSQL Lab and in a local Python/Jupyter environment. Whether you prefer to work directly within ThanoSQL Lab’s integrated environment or set up a local development environment on your machine, the instructions provided will guide you through the necessary steps.
If you want to try running the code from this use case, you can download the complete Jupyter notebook using this link: Download Jupyter Notebook. Alternatively, you can download it directly to your machine using the wget command below:wget https://raw.githubusercontent.com/smartmind-team/mintlify-docs/main/examples/Review_Analytics.ipynb
To run the models in this tutorial, you will need the following tokens:
- OpenAI Token: Required to access all the OpenAI-related tasks when using OpenAI as an engine. This token enables the use of OpenAI’s language models for various natural language processing tasks.
- Huggingface Token: Required only to access gated models such as Mistral on the Huggingface platform. Gated models are those that have restricted access due to licensing or usage policies, and a token is necessary to authenticate and use these models. For more information, check this Huggingface documentation.
Make sure to have these tokens ready before proceeding with the tutorial to ensure a smooth and uninterrupted workflow.
Dataset
We will be working with the following datasets:
- Review Comments Table (review_comments): Contains textual reviews of products.
ProductId: Unique identifier for each product.
UserId: Unique identifier for each user.
Text: Text of the product review.
- Review Comments Sentiment Table (review_comments_sentiment): Contains product reviews along with their sentiment classification.
ProductId: Unique identifier for each product.
UserId: Unique identifier for each user.
Text: Text of the product review.
Sentiment: Sentiment classification of the review.
- Sales Data Table (review_sales): Contains sales data for each product.
ProductId: Unique identifier for each product.
Score: Sales score for the product.
Goals
- Classify product reviews into emotional categories.
- Link classified reviews with corresponding sales data.
- Generate actionable insights based on the analysis of linked data.
Displaying ThanoSQL Query Results in Jupyter Notebooks
The check_result function is designed to handle and display the results of a database query executed via the ThanoSQL client. It ensures that any errors are reported, and successful query results are displayed in a user-friendly format.
Note: This function is specifically designed to work in Jupyter notebook environments.
from IPython.display import display
def check_query_result(query_result):
if query_result.error_result:
print(query_result.error_result)
else:
if query_result.records is not None and len(query_result.records.data) > 0:
df = query_result.records.to_df()
display(df)
else:
print("Query executed successfully")
Procedure
-
Import ThanoSQL Library:
- Import the ThanoSQL library and create a client instance. This client will be used to interact with the ThanoSQL engine.
from thanosql import ThanoSQL
client = ThanoSQL(api_token="your_api_token", engine_url="engine_url")
-
Upload Data to Tables:
- Upload the
review_sales table which contains sales data for each product.
table = client.table.upload("review_sales", "sales_data.csv", if_exists='replace')
table.get_records(limit=10).to_df()
On execution, we get:
-
This step uploads the
review_sales data to ThanoSQL and retrieves the first 10 records to confirm the upload.
-
Upload the
review_comments table which contains textual reviews of products.
table = client.table.upload("review_comments", "comments_data.csv", if_exists='replace')
table.get_records(limit=10).to_df()
On execution, we get:
-
This step uploads the
review_comments data to ThanoSQL and retrieves the first 10 records to confirm the upload.
-
Upload the
review_comments_sentiment table which contains product reviews along with their sentiment classification.
table = client.table.upload("review_comments_sentiment", "comments_data_sentiment.csv", if_exists='replace')
table.get_records(limit=10).to_df()
On execution, we get:
- This step uploads the
review_comments_sentiment data to ThanoSQL and retrieves the first 10 records to confirm the upload.
-
Classify Reviews and Aggregate Sales Data:
-
Predict sentiment for reviews using a pre-trained model and aggregate the data to link reviews with sales.
-
Predict Sentiment for Reviews:
query_result = client.query.execute("""
-- This query predicts the sentiment of product review comments using a specified model
-- and retrieves the sentiment along with the product ID and text of the review.
SELECT
"ProductId",
"Text",
(
jsonb_array_elements_text(
(
-- Step 1: Predict sentiment for the review comment
thanosql.predict(
input := "Text",
model := 'bhadresh-savani/distilbert-base-uncased-emotion',
model_args := '{"top_k": 1, "device_map": "cpu"}'
)::jsonb
-- Cast the result to JSONB
)
)::jsonb ->> 'label'
-- Extract the 'label' field from the JSONB array
) AS "Sentiment"
-- Alias the result as 'Sentiment'
FROM
review_comments
LIMIT 10;
""")
check_query_result(query_result)
On execution, we get:
- Link Reviews with Sales Data:
- Group reviews by
ProductId and Sentiment to show the count of each sentiment and the corresponding sales data for each product.
query_result = client.query.execute("""
-- This query calculates the total quantity sold for each product and the count of each sentiment from reviews.
-- It uses a model to predict the sentiment of review comments and joins this information with sales data.
SELECT
s."ProductId",
SUM(s."Score") AS total_quantity_sold,
r."Sentiment",
COUNT(r."Sentiment") AS sentiment_count
FROM
review_sales s
JOIN
(
-- Subquery to predict sentiment for review comments
SELECT
"ProductId",
"Text",
(
jsonb_array_elements_text(
(
thanosql.predict(
input := "Text",
model := 'bhadresh-savani/distilbert-base-uncased-emotion',
model_args := '{"top_k": 1, "device_map": "cpu"}'
)::jsonb
-- Cast the result to JSONB
)
)::jsonb ->> 'label'
-- Extract the 'label' field from the JSONB array
) AS "Sentiment"
FROM
review_comments
LIMIT 100
) r
ON
s."ProductId" = r."ProductId"
-- Join condition: match product IDs in sales and review comments
GROUP BY
s."ProductId", r."Sentiment"
-- Group the results by product ID and sentiment
ORDER BY
s."ProductId", sentiment_count DESC;
-- Order the results by product ID and sentiment count in descending order
""")
check_query_result(query_result)
On execution, we get:
- Analyze Positive Reviews and Sales Performance:
- Show the relationship between sales and positive reviews (
joy, love, surprise, neutral) for each product.
query_result = client.query.execute("""
-- This query identifies products with positive reviews and compares their sales performance against the average sales.
WITH positive_reviews AS (
-- Step 1: Calculate the count of positive reviews for each product
SELECT
"ProductId",
COUNT(*) AS positive_count
FROM
(
-- Subquery to predict sentiment for review comments
SELECT
"ProductId",
"Text",
(
jsonb_array_elements_text(
(
thanosql.predict(
input := "Text",
model := 'bhadresh-savani/distilbert-base-uncased-emotion',
model_args := '{"top_k": 1, "device_map": "cpu"}'
)::jsonb
-- Cast the result to JSONB
)
)::jsonb ->> 'label'
-- Extract the 'label' field from the JSONB array
) AS "Sentiment"
FROM
review_comments
LIMIT 100
) AS subquery
WHERE
"Sentiment" IN ('joy', 'love', 'surprise', 'neutral')
-- Filter for positive sentiments
GROUP BY
"ProductId"
-- Group the results by product ID
),
total_sales AS (
-- Step 2: Calculate the total quantity sold for each product
SELECT
"ProductId",
SUM("Score") AS total_quantity_sold
FROM
review_sales
GROUP BY
"ProductId"
)
SELECT
ts."ProductId",
ts.total_quantity_sold,
COALESCE(pr.positive_count, 0) AS positive_count,
-- Count of positive reviews, 0 if no positive reviews
CASE
WHEN pr.positive_count IS NULL THEN 'No Positive Reviews'
-- No positive reviews case
WHEN ts.total_quantity_sold > (
SELECT AVG(total_quantity_sold)
FROM total_sales
) THEN 'Above Average Sales'
-- Above average sales case
ELSE 'Below Average Sales'
-- Below average sales case
END AS sales_performance
-- Categorized sales performance
FROM
total_sales ts
LEFT JOIN
positive_reviews pr
-- Left join with positive reviews data
ON
ts."ProductId" = pr."ProductId"
-- Join condition: match product IDs
ORDER BY
ts.total_quantity_sold DESC;
-- Order the results by total quantity sold in descending order
""")
check_query_result(query_result)
On execution, we get:
Conclusion
This use case has guided you through the process of using ThanoSQL to analyze product reviews and their impact on sales performance. By classifying reviews into emotional categories and linking them with sales data, you can uncover valuable insights that help in understanding customer sentiment and its correlation with product success. This analysis can inform your marketing strategies, product development, and customer service approaches, ultimately contributing to improved customer satisfaction and business growth.