Build an Analytics Web Dashboard with Jupyter Notebook, MongoDB, and HTML/CSS/JS Stack
This blog will explore the process of deploying a simple analytics pipeline where we will present the data parsed and analyzed in a jupyter notebook in a web front-end hosted on AWS Amplify. We are using the ads impression and costing database for our model, but the pipeline architecture should work for any raw data.
This Data Pipeline is divided into three main components.
- Data Parsing and Analysis: In this phase, we will use Jupyter Notebook and Pandas Library to parse the raw data and create data models relevant to our goal.
- MongoDB: To store the modelling output, we will use MongoDB as the document storage. The ease of access and retrieval of the collections guides this decision. MongoDB was also used to visualize the collection data using MongoDB Charts.
- HTML/CSS/JS: For the Dashboard, we used the plain HTML/CSS/JS stack to quickly display the incoming data from our remote MongoDB Database.
This section will detail the general data parsing and analysis requirements for a data analytics project.
1. Data Preprocessing :
To begin any data modelling, we need to understand the raw data we are using to build our models. First, we will import the modules necessary for our analysis. For this project, we are primarily using pandas and pymongo libraries.
We have two CSV files that contain raw data relevant to the final outcome. We will create the dataframes to store the information from these CSV files using the read_csv function.
We can print the first rows of these dataframes to explore the data.
Using the info function from the pandas library, we can find relevant information about the columns. As we can notice, the date columns in both data frames are of type object, so we need to convert them into date data type. The column's name present in both data frames, which will also be used to join the two dataframes, is distinct so we need to rename them for using the merge function.
2. Data Aggregation :
From this dataset, we want to reduce the number of ad impressions for each category and multiply that with the corresponding CPM value, divide the result by 1000 to reach our final outcome. To do that, we will use the groupby and agg function to group and aggregate the data.
At this point, the dataframe will look like this,
We have aggregated all the data and grouped it with the category. Now we need to perform the final calculation to reach our outcome.
The final dataframe with our output will look like this,
3. Data Storage :
We will use the MongoClient function from the Pymongo library to connect to the remote MongoDB instance. The connection string URI can be further explored here. We will create a database called Ad_Revenue and store the final output in the Total_Cost collection.
Reset the index of the dataframe and store the output in a dictionary. Finally, we will store this dictionary in our database’s collection.
With our data now safely stored in MongoDB we can now pipe this data to our front end display. MongoDB offers robust visualization tools which we can use with our collection data to display the data in various types of charts.
The editing window of MongoDB Charts is very simple and easy to understand to be able to build relevant visualizations.
We can embed these visualizations in our HTML code using MongoDB Charts Embedding SDK.
The interface was deployed on AWS Amplify which can be accessed via this link : https://main.d19829dc7ci9hs.amplifyapp.com/
The entire code for this project is hosted on Github an can be accessed via this link : https://github.com/hershdhillon/Analytics-Web-Frontend-Pipeline
I hope this article helped you in understanding how easy it is to deploy quick analytics and web dashboard solutions which can be easily integrated in existing architecture.
Have a great time and stay safe,