When looking for a job, or creating a job aggregator, it’s sometimes hard to keep track of interesting offers. Job posts get updated daily and new ones flood in every hour or so. Manually visiting the page again becomes cumbersome and tiring. In an ideal world we would just sit back and get the offers flood to your mailbox. Then take action on the ones you’re interested in.
What if we could automate this process? What would we have to do in order to get job posts send to my email automatically? Preferably every hour as that’s how often jobs get posted.
Using this new Extension called Scraper.AI it becomes as easy as clicking a few buttons. It does the rest for you.
In this case we want to extract job posts from indeed.com
The Actual scraping
First, head over to https://scraper.ai and create an account and install their extension. It only takes 1min.
Afterwards we head over to our job listing. We want to get all posts in the city of London that are related to Software engineer. So we type “Software Engineer” in the “What” Field and “London” in the “Where” field. Notice the url changing to “https://www.indeed.com/jobs?q=Software%20engineer&l=London&vjk=cc297f2eaab0dec8”
We don’t know what’s “vjk”, if we remove it it keeps working. This is an indicator that it might be unique to our query and “tracks” what we’re doing. Since it doesn’t break functionality, we remove it as a cleaner url is easier to work with.
Afterwards head over to your extensions and open Scraper.AI by clicking their logo
Followed by clicking “Select Element”. This is where the real magic starts and which will simplify the process a lot.
We just need to highlight the fields we’re interested in. Being the
- “Job Title”
- “Job Location”
- “Job Description”
- “Job post date”
Give them a title when selecting
On the end you’ll end up with a highlighted page and your selections in bottom right.
Since these are only 10 listings, and there are way more pages, we’re interested in scanning the other pages as well. Luckily this is a easy as selecting the next page button. So click “Next” and then click the “Select” button next to “Next Page Button” to highlight the next button on the webpage.
After that’s highlighted we’re finished and can finish the process by clicking “Finish”. This will take us back to the overview at https://scraper.ai where i can see my extracted data.
From there on it’s entirely automated, Scraper.AI will visit the page in an interval i specified and extract and update my requested data accordingly.
Transforming and mailing
This get’s collected in a JSON endpoint that I can feed into, for example, email processor and let me send an email as soon as something new comes available. This list can also be used to combine with other sites.
Soon CSV download will be available and we can even analyze some posts using Microsoft Excel.
In the next part we’ll show how you can run different pipelines automatically using their webhooks.
Thanks for reading — let me know what you’d like to see next!