Problem
Scraping seems to be one of the things many companies want to do in order to gather data from competitors, gather data to improve their products, integrate it into their product flows, broaden sales & marketing, etc.
There are many scraping tools out there. Small incomplete selection:
- For coders: https://www.scrapingbee.com/, https://apify.com/, https://webscraper.io/
- For non-coders: https://oxylabs.io/, https://www.diffbot.com/web-scraping/, https://dataminer.io/
But they don’t have the super simple workflow I think you’d want based on my experience in building scrapers at my last startups.
Idea
Super simple scraping workflow - it should be as easy as setting up a Webflow website
- See the website you want to scrape (maybe including an inspector to see HTML components)
- Write scraping code → later create scraper logic via no code
- Check results
- Schedule & deploy
Check out my Figma doodling for more:
Slai does a very good job at providing users with a very simple ML building flow. Partially reminds me of an ML verticalized Replit.
The vision of such a tool: The fastest way to scrape.
- No code scraping tool
- CLI + python lib if you want to set up bigger volumes of scrapers
- Monitoring of runs, alerts & dashboard features
- We handle the IP block circumvention, reruns if fails, scripting, execution, saving & hosting of data, etc
Thoughts
- Main benefit for coders: The scraping code is usually <5% of the actual work. The parsing might take another 10% but most of the work in trying to get a scalable scraper to run continuously is in the architecture (lambda, Postgres, scheduling) plus runner scripts that don’t get blocked (selenium et al.)
- Main benefit for biz users: Scrape data without the help of a coder. Get results quickly and easily but also be able to hand scrapers over to coders for more complex scraping workflows.