Skip to main content

How to scrape specific data from scrape with simple html dom parser


I am trying to scrape the datas from a webpage, but I get need to get all the data in this link .




include 'simple_html_dom.php';
$html1 = file_get_html('http://www.aktive-buergerschaft.de/buergerstiftungen/unsere_leistungen/buergerstiftungsfinder');

$info1 = $html1->find('b[class=[what to enter herer ]',0);



I need to get all the data out of this site .




Bürgerstiftung Lebensraum Aachen
rechtsfähige Stiftung des bürgerlichen Rechts
Ansprechpartner: Hubert Schramm
Alexanderstr. 69/ 71
52062 Aachen
Telefon: 0241 - 4500130
Telefax: 0241 - 4500131
Email: info@buergerstiftung-aachen.de
www.buergerstiftung-aachen.de
>> Weitere Details zu dieser Stiftung

Bürgerstiftung Achim
rechtsfähige Stiftung des bürgerlichen Rechts
Ansprechpartner: Helga Kühn
Rotkehlchenstr. 72
28832 Achim
Telefon: 04202-84981
Telefax: 04202-955210
Email: info@buergerstiftung-achim.de
www.buergerstiftung-achim.de
>> Weitere Details zu dieser Stiftung



I need to have the data that are "behind" the link - is there any way to do this with a easy and understandable parser - one that can be understood and written by a newbie!?


Source: Tips4allCCNA FINAL EXAM

Comments

  1. Seems to be written in the documentation:

    $html1->find('b[class=info]',0)->innertext;

    ReplyDelete
  2. Your provided links are down,
    I will suggest you to use the native PHP "DOM" Extension instead of "simple html parser", it will be much faster and easier ;)
    I had a look at the page using googlecache, you can use something like:-

    $doc = new DOMDocument;
    @$doc->loadHTMLFile('...URL....'); // Using the @ operator to hide parse errors
    $contents = $doc->getElementById('content')->nodeValue; // Text contents of #content

    ReplyDelete
  3. From what i can quickly glance you need to loop through the <dl> tags in #content, then the dt and dd.

    foreach ($html->find('#content dl') as $item) {
    $info = $item->find('dd');
    foreach ($info as $info_item) {..}
    }


    Using the simple_html_dom library

    ReplyDelete
  4. XPath makes scraping ridiculously easy, and allows for some changes in the HTML document to not affect you. For example, to pull out the names, you'd use a query that looks like:

    //div[id='content']/d1/dt


    A simple Google search will give you plenty of tutorials

    ReplyDelete
  5. @zero: there is good site to try out scrapping a site using both php and python...pretty helpful site atleast to me:-
    http://scraperwiki.com/

    ReplyDelete

Post a Comment

Popular posts from this blog

Why is this Javascript much *slower* than its jQuery equivalent?

I have a HTML list of about 500 items and a "filter" box above it. I started by using jQuery to filter the list when I typed a letter (timing code added later): $('#filter').keyup( function() { var jqStart = (new Date).getTime(); var search = $(this).val().toLowerCase(); var $list = $('ul.ablist > li'); $list.each( function() { if ( $(this).text().toLowerCase().indexOf(search) === -1 ) $(this).hide(); else $(this).show(); } ); console.log('Time: ' + ((new Date).getTime() - jqStart)); } ); However, there was a couple of seconds delay after typing each letter (particularly the first letter). So I thought it may be slightly quicker if I used plain Javascript (I read recently that jQuery's each function is particularly slow). Here's my JS equivalent: document.getElementById('filter').addEventListener( 'keyup', function () { var jsStart = (new Date).getTime()...

Is it possible to have IF statement in an Echo statement in PHP

Thanks in advance. I did look at the other questions/answers that were similar and didn't find exactly what I was looking for. I'm trying to do this, am I on the right path? echo " <div id='tabs-".$match."'> <textarea id='".$match."' name='".$match."'>". if ($COLUMN_NAME === $match) { echo $FIELD_WITH_COLUMN_NAME; } else { } ."</textarea> <script type='text/javascript'> CKEDITOR.replace( '".$match."' ); </script> </div>"; I am getting the following error message in the browser: Parse error: syntax error, unexpected T_IF Please let me know if this is the right way to go about nesting an IF statement inside an echo. Thank you.