Scrapping and posting to WordPress

Vert

Overcoming adversity
Joined
Oct 3, 2015
Messages
95
Likes
82
Degree
0
I want to create a script that extracts some content from some websites and then upload it to a WordPress post and save it as a draft.

What language and tools do you recommend to use?

From what I see, I need to extract the data from the websites, convert it to XML, upload it to WordPress with XML-RPC and then save it as a draft.

It seems that i could do part of the job with some Python scrapper (Scrapy?), but if you known some tool with wich could be done more easily, I would apreciate if you could point me to it.
 
Scrapy is pretty decent, and easy to get started with for anyone that's already done a bit of Python before. I wrote up a tutorial on getting started with it: Scrapy framework for Python

If you want to avoid coding, another easier and quicker option would be creating the appropriate Xpaths (assuming it'll be that easy for your data sources), then plugging those into the custom extraction fields for Screaming Frog SEO Spider. Last I checked, you can only do 10 fields though, so I'm not sure if that's sufficient for your needs.

Also, if you don't mind the expense, WP All Import works well and makes simple work of importing and generating posts, pages, or whatever you want to generate from a dataset.
 
I've used WP All Import with massive amounts of CSV files. Worked great, made a lot of money, sold the sites. I still have one that's based on the work that plugin allowed me to do. You can get crafty before the upload with meta data in custom fields for sorting later with the loops. It all depends on what you're doing with the data but yeah, it worked well.
 
Are you planning on modifying the content or just scraping and posting? You could try WP Pumper ( a plugin I used to own ). The guy who bought it off of me was a programmer and totally revamped it. I know its able to connect with WordAI and Spin Rewriter to spin the content before posting. I think it still has the option to not spin and just post. It pulls from RSS feeds. May or may not do what you're wanting to accomplish
 
Are you planning on modifying the content or just scraping and posting?

I want to scrape some content on some pages (but not the entire page). I do not need to edit the content. The idea is scraping, posting and adding some content manually.

Another option is to extract the text I need and publish it manually with the added text.
 
Back