Back
Post
Posted

Does anyone scrape with AI?

hi all! does anyone have experience with scraping real estate data using ai? i'm currently using scraping bee which is nice but maybe there are better ways in 2025. i need to scrape things like image, description, body, title, link etc.


I do quite a bit of scraping and have had good luck using various AI platforms. I haven't done real estate directly, but I have pulled restaurant details to build a few database and was happy with it.

I use jina.ai a lot for various projects. I have ZERO association with them, but it does a good enough job of cleaning up a URL to pass to OpenAI or Claude with great results. I'm using the Reader API, but they have some others that might be good.

I just pass the results to an LLM and rarely look back.

I use Python 99% of the time and I have had a ton of success with OpenAI + Structured data. I'm a few weeks into using PydanticAI for nice JSON/structured data out and I love using it. (also no association) ai.pydantic.dev/results/

Depends what kind of bot protection and volume of scraping you're looking at.

If there's minimal bot protection and you're able to scrape locally off your PC, you can literally take the page html, put a schema into openAI structured response, and get your data.

If you can't get by from scraping locally or using residential proxies on a server due to high bot protection, then it's probably not worth building yourself.

Check out fire crawl and see how it compares to scraping bee for your needs.

not using it for real estate data, but i've been recently using firecrawl (no associaton) to AI scrape a few different platforms. it's nice because you pass it a schema and it will do it's best to put structured output into that schema. i've only used it for pages that are pretty highly structured tho, not sure how well it performs with more ambigious/less deterministic data or page structures

Home
Search
Messages
Notifications
More