
Crawlomatic Multisite Scraper Post Generator Plugin for WordPress
What Can You Do With This Plugin?
Crawlomatic Multisite Scraper Post Generator Plugin for WordPress is a breaking edge web site crawling and scraping, put up generator autoblogging plugin that makes use of web site crawling and scraping to show your web site right into a autoblogging or perhaps a cash making machine!
Get content material from nearly any webpage! You now not want API’s which requires registration and supplies restricted entry, additionally you may retrieve knowledge from non API offering web sites. Schedule it for as soon as and let it autopilot your posts 7/24 for you want a grasp!
How does it work?
This plugin will crawl the seed URL you give it (crawling means that it’ll search all hyperlinks that the webpage incorporates) and can go to and extract content material from every crawled URL. The crawling course of is customizable: you may set the crawling depth, crawling charge, most crawled article rely, crawl solely hyperlinks with particular class or ID and lots of extra customizations.
Crawlomatic v2.0 replace
In the v2.0 replace, a brand new dwell scraper shortcode was added to the plugin: [crawlomatic-scraper]. This new function makes this plugin a straightforward to implement net knowledge extractor for WordPress. As a consequence, it may be used to show real-time knowledge from any web sites straight into your posts, pages or sidebar. It additionally quickly caches the scraped content material, so your web site is not going to over use on sources. You can use this plugin to incorporate real-time inventory quotes, cricket or soccer scores or every other generic content material from public domains!
New options included on this replace:
- Scraped output could be displayed by way of customized template tag, shortcode in web page, put up and sidebar (by way of a textual content widget).
- Configurable caching of scraped knowledge. Cache timeout could be outlined in minutes for each scraped knowledge.
- Configurable Useragent for your scraper could be set for each scrape.
- Configurable default settings like enabling, useragent, timeout, caching, error dealing with.
- Multiple methods to question content material – CSS Selector, XPath or Regex, Auto Detection.
- A variety of arguments for parsing content material.
- Option to go put up arguments to a URL to be scraped.
- Dynamic conversion of scraped content material to specified character encoding to scrape knowledge from a web site utilizing completely different charset.
- Create scraped pages on the fly utilizing dynamic era of URLs to scrape or put up arguments based mostly in your web page’s get or put up arguments.
- Callback operate for superior parsing of scraped knowledge.
Check the official documentation of the v2 update, flick through examples and verify FAQ for crafting a wonderfully optimized net scraper.
More in regards to the plugin
You can scrape content material from nearly each web page that you just open in your browser. If the content material is loaded utilizing JavaScript, the plugin could be mixed with PhantomJS to scrape additionally JavaScript generated content material.
Also, you may robotically generate limitless variety of customized web site crawling and scraping.
Other plugin options:
- v2.5.5 replace: Automatically replace scraped posts/pages/merchandise if the supply web site modifications + unpublish (set as draft) the put up/web page/product if the scraped URL is now not out there on the supply web site (elective options, could be enabled/disabled)
- v2.5.1 replace: Scrape WooCommerce product variants from different WooCommerce/Shopify shops
- v2.5.0 replace: Scrape search engine outcomes for your customized key phrase searches, from Google or from Bing. Check the tutorial video of this new feature.
- v2.4.1 replace: Scrape product picture galleries for WooCommerce merchandise (for non-product put up sorts, put up attachments can be created from the scraped pictures)
- v2.3.5 replace: Execute your personal JavaScript code on the scraped HTML and scrape the outcomes – this function is obtainable solely when headless browsers are used for scraping (Puppeteer/Tor/PhantomJS) or HeadlessBrowserAPI
- v2.2.1 replace: Crawl RSS feeds for hyperlinks and scrape articles listed in them
- v2.2.0 replace: Use HeadlessBrowserAPI to scrape JavaScript Generated HTML Content from any web site on the web with out the necessity to set up something (moreover this plugin) in your server – tutorial video
- v2.1.0 replace: Scrape .onion web sites from the Dark Web utilizing the Tor Browser and Puppeteer! – tutorial video
- v2.0.0 replace: Live Scraper shortcode added for much more crawling management and scraping energy: [crawlomatic-scraper]
- v1.7.1 replace: Sitemap crawling supported – video tutorial
- v1.6.5 replace: Visual content material selector help added – video tutorial
- v1.6.0 replace: Added the power to make screenshots of crawled pages and use them in generated put up’s content material – video tutorial
- v1.5.2 replace: Ability to shorten outgoing (put up supply) hyperlinks (and monetize them), utilizing Shorte.st hyperlink shortener service – example of shortened link
- v1.4.8 replace: Added JavaScript execution help for crawled pages – requires PhantomJS put in on server – How to install PhantomJs? – video tutorial
- v1.4.4 replace: Added the power to set a number of proxies for crawling pages. The plugin will choose one at random at every web page entry
- v1.4.0 replace: Added the power to paginate crawling (crawling for articles will proceed on the following web page of the seed web page).
- v1.4.0 replace: Added the power to import product costs for crawled merchandise (WooCommerce appropriate) + dropshipping value automated modification – video tutorial
- v1.4.0 replace: Added the power to extend imported product value by a hard and fast quantity or to multiply it with a predefined quantity (nice worth for dropshipping!)
- v1.2.8 replace: Added paginated put up importing help (right into a single crawled put up) Check: VIDEO.
- v1.2.4 replace: Added the power to set proxies for crawling pages
- v1.2.3 replace: Added an choice to crawl the web page from Google cache when direct crawling fails (blocked)
- Google Translate help – choose the language wherein you need to put up your articles
- Text Spinner help – robotically modify generated textual content, altering phrases with their synonyms – built-in, The Best Spinner, SpinRewriter, WordAI, TurkceSpin and others – nice website positioning worth!
- customizable generated put up standing (revealed, draft, pending, personal, trash)
- shortcode to record all posts generated by this plugin: [crawlomatic-list-posts type => ‘any’, order => ‘ASC’, ‘orderby’ => ‘date’, ‘posts’ => 50, ‘category’ => ’’, ‘ruleid’ => ’’]
- crawling and scraping could be set to respect the robots.txt information of internet sites and robots HTML headers of scraped pages
- robotically generate put up classes or tags from market gadgets
- manually add put up classes or tags to gadgets
- select if you wish to replace put up whether it is already posted
- ship customized cookies with the request to the crawled webpage (authentification)
- generate put up or web page or any customized put up kind
- embeds movies from YouTube, Vimeo, Flickr, IGN, Ustream.television and DailyMotion utilizing web site crawling and scraping
- outline publishing constrains: don’t publish posts that do not need pictures, posts with brief/lengthy title/content material
- robotically generate a featured picture for the put up
- allow/disable feedback, pingbacks or trackbacks for the generated put up
- customise put up title and content material (with the included broad number of related put up shortcodes)
- ‘Keyword Replacer Tool’ – It’s objective is to outline key phrases which might be substituted robotically together with your affiliate hyperlinks, anyplace they seem within the content material of your web site. For instance, you may outline a key phrase ‘codecanyon’ and have it substituted by a hyperlink to http://www.codecanyon.net/?ref=user_name anyplace it seems in your web site’s content material.
- ‘Random Sentence Generator Tool’ (related sentences – as you outline them)
- choice to robotically delete generated posts after a time period
- detailed plugin exercise logging
- scheduled rule runs
- customized discipline help for generated posts
- customized taxonomies help for generated posts
- limitless crawled variable importing (limitless imported components of the crawled pages)
- possibility to repeat or not pictures regionally
- capability to parse JSON knowledge utilizing Regex
- possibility so as to add canonical meta tag to generated posts
- Maximum/minimal title size put up limitation
- Maximum/minimal content material size put up limitation
- Add put up provided that predefined required key phrases present in title/content material
- Add put up provided that predefined banned key phrases are usually not discovered within the title/content material
- Save and restore plugin rule record from file
Testing this plugin
- You can check the plugin’s performance utilizing the ‘Test Site Generator’. Here you may strive the plugin’s full performance. Note that the generated testing weblog can be deleted robotically after 24 hours.
Plugin Requirements
- PHP DOM -> the best way to set up it (in case you don’t have it, however most likely you have already got it): http://php.net/manual/en/dom.setup.php
- PHP 5.0 or larger
- dom, mbstring, iconv and json extensions (enabled by default)
For extra information on the best way to configure the plugin, please verify additionally this 1 hour long tutorial video, which covers the total function set of the plugin.
Need help?
Please verify our knowledge base, it could have the reply to your query or an answer for your problem. If not, simply e mail me at support@coderevolution.ro and I’ll reply as quickly as I can.
Changelog:
Version 1.0 Release Date 2017-08-15
First model launched!
Version 1.1 Release Date 2017-08-16
Fixed some small points
Version 1.2 Release Date 2017-08-17
Added the power to crawl web page by div class or id
Version 1.2.1 Release Date 2017-08-18
Fixed incompatibility with some WordPress installs
Version 1.2.2 Release Date 2017-08-22
Added a shortcode to show put up generated by this plugin
Version 1.2.3 Release Date 2017-08-30
Added an choice to crawl the web page from Google cache when direct crawling fails (blocked)
Version 1.2.4 Release Date 2017-08-31
Added the power to set proxies for crawling pages
Version 1.2.5 Release Date 2017-09-04
Added the canonicalization for generated articles
Version 1.2.6 Release Date 2017-09-13
Made the plugin timezone conscious
Version 1.2.7 Release Date 2017-09-14
Fixed put up date for non gmt blogs
Version 1.2.8 Release Date 2017-09-23
Added paginated put up importing help
Version 1.2.9 Release Date 2017-09-27
Bugfixes
Version 1.3.0 Release Date 2017-09-28
Fixed rule restore
Version 1.3.1 Release Date 2017-10-20
Fixed featured picture era
Version 1.3.2 Release Date 2017-10-22
Added crawling helper
Version 1.3.3 Release Date 2017-11-06
Fixed a reminiscence problem
Version 1.3.4 Release Date 2017-11-07
Bugfixes
Version 1.3.5 Release Date 2017-12-14
Fixed class selector not working in all circumstances
Version 1.3.6 Release Date 2017-12-18
Added the power to specify a customized person agent for every crawled webpage
Version 1.3.7 Release Date 2018-01-20
Added a brand new textual content spinner service: Spinrewriter
Version 1.3.8 Release Date 2018-01-22
Plugin can now constantly import content material
Version 1.3.9 Release Date 2018-02-02
Fixed problem when a number of crawl courses the place specified
Version 1.4.0 Release Date 2018-02-22
Major replace: added the power to crawl imported product costs (WooCommerce appropriate) Added the power to crawl serial content material (paged crawling - crawling for articles will proceed on the following web page)
Version 1.4.1 Release Date 2018-03-07
Bugfixes
Version 1.4.2 Release Date 2018-03-21
Fixed a reproduction posting problem
Version 1.4.3 Release Date 2018-03-22
Fixed a essential problem with a number of rule operating
Version 1.4.4 Release Date 2018-04-04
Added the power to outline a number of proxies. The plugin will choose one at random at every web page entry
Version 1.4.5 Release Date 2018-07-13
Updated built-in readability module
Version 1.4.6 Release Date 2018-07-16
Critical bugfixes
Version 1.4.7 Release Date 2018-07-19
Added the power to not translate hyperlinks
Version 1.4.8 Release Date 2018-09-05
Added JavaScript execution help for crawled pages - requires PhantomJS put in on server
Version 1.4.9 Release Date 2018-09-18
Bugfixes
Version 1.5.0 Release Date 2018-09-24
Added the power so as to add customized put up taxonomies from crawled content material Added the power so as to add limitless crawled variables to posts's content material/ meta/ taxonomies
Version 1.5.1 Release Date 2018-10-16
Fixed problem when importing giant pages
Version 1.5.2 Release Date 2018-10-24
Added the power to shorten hyperlinks utilizing Shorte.st
Version 1.5.3 Release Date 2018-10-29
Fixed problem when importing paginated posts
Version 1.5.4 Release Date 2018-11-06
Added the power to strip HTML parts by tag identify (div,a,span,and so forth.)
Version 1.5.5 Release Date 2018-11-07
Added WooCommerce product class creation help
Version 1.5.6 Release Date 2018-12-16
Added nested importing help - import blended content material right into a single put up, from a number of plugins created by CodeRevolution
Version 1.5.7 Release Date 2018-12-16
Added the power to outline a listing of URLs to skip from crawling and importing
Version 1.5.8 Release Date 2019-01-08
Added the power to import royalty free pictures for created posts
Version 1.5.9 Release Date 2019-01-12
Added Gutenberg blocks help
Version 1.6.0 Release Date 2019-02-01
Added the power to make screenshots of scraped pages
Version 1.6.1 Release Date 2019-02-06
Improved compatibility with some crawled pages
Version 1.6.2 Release Date 2019-04-19
Security replace
Version 1.6.3 Release Date 2019-05-15
Fixed some lately discovered bugs with put up pagination
Version 1.6.4 Release Date 2019-05-17
Added help for TurkceSpin content material spinner
Version 1.6.5 Release Date 2019-05-27
Added a a lot demanded new function: Visual Content Selector for assigning scraped web page content material Added the power to scrape pages from backside to high Added the power to switch phrases in scraped content material Other minor bug fixes and performance enhancements
Version 1.6.6 Release Date 2019-07-26
Fixed timeout problem with some crawled pages Many small points fastened and options improved
Version 1.6.7 Release Date 2019-08-05
Fixed problem with Google Translate
Version 1.6.8 Release Date 2019-11-15
WordPress 5.3 compatibility replace
Version 1.6.9 Release Date 2020-05-11
New options added for content material templates Bugfix replace
Version 1.7.0 Release Date 2020-07-21
Added help for scraping extra websites
Version 1.7.1 Release Date 2020-09-28
Added the power to crawl sitemaps and to scrape posts linked in them Added the power to respect the directives set within the robots.txt information
Version 2.0.0 Release Date 2020-12-08
Added a brand new shortcode and Gutenberg block different that can allow dwell scraping of any web site Major efficiency enchancment Fixed reported bugs
Version 2.1.0 Release Date 2021-01-02
Added help for utilizing the Tor Browser to crawl darkish web pages! Scrape .onion web sites such as you would scrape every other public web site!
Version 2.1.1 Release Date 2021-01-04
Added the power to crawl and scrape pages utilizing POST requests (POST type submission scraping help)
Version 2.2.0 Release Date 2021-01-14
Added help for HeadlessBrowserAPI to scrape JavaScript rendered content material with ease
Version 2.2.1 Release Date 2021-01-16
PHP 8 compatibility replace Added help for crawling hyperlinks from RSS feeds
Version 2.2.2 Release Date 2021-01-28
Fixed uncommon problem when saving importing rule settings on some PHP 8 configurations
Version 2.2.3 Release Date 2021-02-01
Improved content material extraction algorithm
Version 2.2.4 Release Date 2021-02-17
Added the power to not spin posts generated by particular guidelines
Version 2.2.5 Release Date 2021-03-07
Added the power to enter a number of URLs (one per line) to be crawled and scraped
Version 2.2.6 Release Date 2021-03-07
Visual Selector enhancements - now will probably be ready to make use of HeadlessBrowserAPI/Puppeteer/PhantomJS/Tor to visualise scrape content material
Version 2.2.7 Release Date 2021-04-02
Fixed uncommon points when crawling hyperlinks with URL parameters
Version 2.2.8 Release Date 2021-04-07
Fixed uncommon points with relative URL paths in crawled content material
Version 2.2.9 Release Date 2021-05-03
Added the power to skip publishing of latest posts if not pictures discovered (individually, for every rule)
Version 2.3.0 Release Date 2021-05-19
Added the power to make screenshots of internet sites utilizing the HeadlessBrowserAPI function
Version 2.3.1 Release Date 2021-06-10
Fixed content material extracting/stripping in case of some web sites with dynamically generated content material
Version 2.3.2 Release Date 2021-07-15
Added a number of Regex expression help (for content material stripping and alternative)
Version 2.3.3 Release Date 2021-07-18
Added SpinnerChief to the supported premium textual content spinners (SpinRewriter, The Best Spinner, WordAI, TurkceSpin)
Version 2.3.4 Release Date 2021-07-19
Added Bing Translator help (subsequent to Google Translator and DeepL Translator)
Version 2.3.5 Release Date 2021-08-06
Added the power to execute your personal customized JavaScript on scraped pages when utilizing headless browsers (PhantomJS/Puppeteer/Tor) or HeadlessBrowserAPI (XSS - cross web site scripting function) and scrape the ensuing HTML content material
Version 2.3.6 Release Date 2021-08-30
Added the power to set featured pictures of posts from web site screenshots Added the power to take away HTML content material (depart textual content solely) of XPath matched content material
Version 2.3.7 Release Date 2021-09-02
Added the power to set native storage objects when scraping web sites (these are just like cookies, their utilization is supported solely when utilizing headless browsers or HeadlessBrowserAPI at the side of the plugin)
Version 2.3.8 Release Date 2021-09-15
Added the power to set the WPML language to created posts
Version 2.3.9 Release Date 2021-10-19
WooCommerce product scraping associated enhancements
Version 2.4.0 Release Date 2022-02-28
Added help for creating WooCommerce product attributes and assign values to them from scraped knowledge
Version 2.4.1 Release Date 2022-03-05
Added the power to scrape picture galleries for WooCommerce merchandise
Version 2.4.1.1 Release Date 2022-03-21
Bugfix replace
Version 2.4.2 Release Date 2022-04-20
Fixed Google Translator downside attributable to a latest Google API replace
Version 2.5.0 Release Date 2022-05-01
Crawlomatic now can scrape search engine outcomes from Google and Bing - tutorial video: https://www.youtube.com/watch?v=h6fQeH9-X8c
Version 2.5.1 Release Date 2022-05-06
Added the power to scrape WooCommerce product variations from Shopify and different WooCommerce merchandise Added the power to robotically detect product costs Improved readability module Fixes and enhancements
Version 2.5.2 Release Date 2022-06-14
Added the power to translate posts a 3rd time (performing like a Word Spinner, if the content material is translated again to the unique language
Version 2.5.3 Release Date 2022-06-23
Fixed WooCommerce value scraping associated problem
Version 2.5.4 Release Date 2022-09-12
Added the power to scrape hyperlinks from TXT information
Version 2.5.5 Release Date 2022-10-14
Major replace: put up/web page/product automated updating if the scraped supply URL modified
Version 2.5.6 Release Date 2022-11-30
Major replace: added help for Google News scraping
Version 2.5.7 Release Date 2023-01-05
Added a brand new capability to HeadlessBrowserAPI to click on on HTML parts by CSS selectors, enabling loading of Ajax content material and bypassing Captchas which require a click on
Version 2.5.8 Release Date 2023-01-17
Added product common value scraping function to WooCommerce merchandise - the common value is the value displayed earlier than the low cost is utilized. You can scrape this full value from the web sites or add/multiply the unique value to create it robotically
Version 2.5.9 Release Date 2023-02-10
Fixed Google News scraping after latest modifications
Version 2.6.0 Release Date 2023-03-13
Added extra DeepL languages Multiline scraping expressions help added Fixed all reported points
Version 2.6.0.1 Release Date 2023-04-13
Fixed reported bugs
Version 2.6.0.2 Release Date 2023-05-10
Improved scraper auto detection
Version 2.6.0.3 Release Date 2023-05-22
Fixed extra reported bugs
Version 2.6.0.4 Release Date 2023-06-13
Reworked backend, improved scraping velocity
Version 2.6.0.5 Release Date 2023-06-29
Scraped content material now higher matches supply web site styling
Version 2.6.0.6 Release Date 2023-07-28
Fixed Google Translate integration, working with newest modifications
Version 2.6.0.7 Release Date 2023-10-18
Fixed PHP 8.2 associated errors
Version 2.6.1 Release Date 2024-02-15
Fixed a difficulty with rule saving
Version 2.6.2 Release Date 2024-03-15
Visual selector repair for CSS problem taking place in some circumstances
Version 2.6.3 Release Date 2024-07-12
Bugfix launch Purchase code verification now required for the plugin to operate
Version 2.6.4 Release Date 2024-10-26
Content filtering enhancements
Version 2.6.5 Release Date 2024-10-31
Added help for automated Magento product variation scraping
Are you already a buyer?
If you already purchased this and you’ve got tried it out, please contact me within the merchandise’s remark part and provides me suggestions, so I could make it a greater WordPress plugin!
WordPress 6.7 and PHP 8.4 Tested!
Disclaimer
Through this plugin you’ll be able to seize content material from varied web sites that doesn’t needed belong to you or which aren’t beneath your management. If you seize copyrighted materials with out the writer’s permission, the plugin’s developer doesn’t assume any duty for your actions. Also, the plugin’s developer has no management over the character, content material and availability of these websites.
Do you want our work and wish extra of it?
Check out this MEGA plugin bundle.