August 9, 2020 - analytics
Template for visualizing web scraped datasets (800 Venture Capital demo)
Web scraping is great, but it also tends to blow up rather easily. you need to install proxies, get around captchas and a lot more. Often you need databases, …
It gets complex rather fast, and then you don’t even have your results yet. Here I have a minimalistic example on how to analyze and plot a web scraped dataset.
We scraped a dataset that looks like:

Investmentsize
.To load the data, we use a **json** endpoint as the data will be scraped and monitored every day. let’s use https://api.scraper.ai/api/website/02209169-a5fb-4a90-b540-9b86a751de95?api_key=10c5a982abeefecc50a68f134ff470ec&json as an example, this dataset contains the data shown above in the table.
# getting the scraped dataurl = 'https://api.scraper.ai/api/website/02209169-a5fb-4a90-b540-9b86a751de95?api_key=10c5a982abeefecc50a68f134ff470ec&json'r = requests.get(url).json()
Now we have a dictionary with data, to convert this to a dataset we use **pandas** where we take the count of each investment size.
Notice the **map** is being used to only take out the Investmentsize
and get a list of investment sizes
# preparing the data framedf = pd.DataFrame({"investment_size": map(lambda val: val["Investmentsize"], r["data"])})df = df.value_counts().rename_axis('investment_size').reset_index(name='count')
the data frame now looks like

# plottingsns.set(style="ticks", palette="pastel")sns_plot = sns.barplot(x="investment_size", y="count", data=df)sns.despine(offset=10, trim=True)sns_plot.get_figure().savefig("output.png")
and you’ll see an image like

import requestsimport seaborn as snsimport pandas as pd# getting the scraped dataurl = 'https://api.scraper.ai/api/website/02209169-a5fb-4a90-b540-9b86a751de95?api_key=10c5a982abeefecc50a68f134ff470ec&json'r = requests.get(url).json()# preparing the data framedf = pd.DataFrame({"investment_size": map(lambda val: val["Investmentsize"], r["data"])})df = df.value_counts().rename_axis('investment_size').reset_index(name='count')# plottingsns.set(style="ticks", palette="pastel")sns_plot = sns.barplot(x="investment_size", y="count", data=df)sns.despine(offset=10, trim=True)sns_plot.get_figure().savefig("output.png")
