WebXtract: The web content extraction and analysis tool


A set of advanced tools that enable the selection and extraction of content from web pages and entire web sites.

1. Web page content extractor

This application uses content profiles for different web pages and returns content in a structure JSON format.

For example, a content profile could be created for a YouTube page, or a BBC news page, or a twitter page. Once the profile is created, WebXtract can then return the content from any webpage that applied to the given profile.

2. Web page analyser

This tool analyses any web page and applies a 'profile' to perform a range of useful tasks.

For example, the web page analyser can extract every link in page, or display every image contained in a page, or perform SEO (seach engine optimization) analysis, or provide a simple 'overview' of a page and its contents.

3. Web site analyser

This applicaton can scan through every page in a website and perform web page analysis on each page.

The site scanner uses exactly the same profiles as the Web page analyser.

This application is great for getting complete overview of your entire website.

WebXtract is in its early stages, but aims to provide a set of really powerful tools and API's to be used for a broad range of applications.


If you are interested in using WebXtract or you have a question about how it can be used, please do get in touch via email (tom@webxtract.com) or visit my personal webpage contact form (TomCarnell.com).