HTML to Text

A common problem when working with internet data is “What can I do about all these tags”. Cleaning html can be a daunting task. A simple work around is to use the help of a text-based web browser.

Text-based browing

Here is an example on how to use the Linux browser “links”.

#links -dump http://www.salsadev.com/

This command can be wrapped into a php file and used as a service:

if(isset($_REQUEST[’page’]) && $_REQUEST[’page’] != “”){
echo system(”links -dump “.$_REQUEST[’page’]);
}

links.php

This script is available online for those who do not have access to a Linux shell. Point your browser to http://www.salsadev.com/tools/links.php and start working with text-based data. The script takes one url parameter called ‘page’.

Sample

Test Links-based browsing:

Leave a Reply