Mit Einfachen Html-Dom, entfernen Sie einige Elemente

Diese ist die Seite, die ich versuche zu analysieren, mit Einfachen Html-Dom. Ich habe mittlerweile 90% der Funktionalität getan, aber da ich neu bin die Bibliothek, ich bin mir nicht ganz sicher, dies zu tun.

Ich will kratzen die text jeder Nachricht, aber da der text innerhalb der <p>element, mit so etwas wie ->innertext bringen Sie alles nach innen, mit dem link.

Hier ist, was ich versucht habe:

<h1>Scraper Noticias</h1>

<?php

include('simple_html_dom.php');

class News {
    var $image;
    var $fechanoticia;
    var $title;
    var $description;
    var $sourceurl;

    function get_image( ) {
        return $this->image;
    }

    function set_image ($new_image) {
        $this->image = $new_image;
    }

    function get_fechanoticia( ) {
        return $this->fechanoticia;
    }

    function set_fechanoticia ($new_fechanoticia) {
        $this->fechanoticia = $new_fechanoticia;
    }

    function get_title( ) {
        return $this->title;
    }

    function set_title ($new_title) {
        $this->title = $new_title;
    }

    function get_description( ) {
        return $this->description;
    }

    function set_description ($new_description) {
        $this->description = $new_description;
    }

    function get_sourceurl( ) {
        return $this->sourceurl;
    }

    function set_sourceurl ($new_sourceurl) {
        $this->sourceurl = $new_sourceurl;
    }
}

//Create DOM from URL or file
$html = file_get_html('http://www.uvm.cl/noticias_mas.shtml');

$parsedNews = array();

//Find all news items.
foreach($html->find('#cont2 p') as $element) {

    $newItem = new News;

    //Parse the news item's thumbnail image.
    foreach ($element->find('img') as $image) {
        $newItem->set_image($image->src);
        //echo $newItem->get_image() . "<br />";
    }

    //Parse the news item's post date.
    foreach ($element->find('span.fechanoticia') as $fecha) {
        $newItem->set_fechanoticia($fecha->innertext);
        //echo $newItem->get_fechanoticia() . "<br />";
    }

    //Parse the news item's title.
    foreach ($element->find('a') as $title) {
        $newItem->set_title($title->innertext);
        //echo $newItem->get_title() . "<br />";
    }

    //Parse the news item's source URL link.
    foreach ($element->find('a') as $sourceurl) {
        $newItem->set_sourceurl("http://www.uvm.cl/" . $sourceurl->href);
    }

    //Parse the news items' description text.
    echo $link; //This is the entire <p> tag. How can I get just the text. Not the link?

} 

?>

Habe es gerade getestet und es gibt 7 links. Sind Sie wollen nur den text und die Streifen, die links?
Genau. 🙂 Das ist genau das Problem, das ich habe. Ich will den text, ohne link.
Siehe unten......

InformationsquelleAutor Only Bolivian Here | 2012-07-26

Hier ist eine Lösung, die ich gefunden. Obwohl, wenn ich kann, verbessern den code, würde es geschätzt werden.

<h1>Scraper Noticias</h1>

<?php

include('simple_html_dom.php');

class News {
    var $image;
    var $fechanoticia;
    var $title;
    var $description;
    var $sourceurl;

    function get_image( ) {
        return $this->image;
    }

    function set_image ($new_image) {
        $this->image = $new_image;
    }

    function get_fechanoticia( ) {
        return $this->fechanoticia;
    }

    function set_fechanoticia ($new_fechanoticia) {
        $this->fechanoticia = $new_fechanoticia;
    }

    function get_title( ) {
        return $this->title;
    }

    function set_title ($new_title) {
        $this->title = $new_title;
    }

    function get_description( ) {
        return $this->description;
    }

    function set_description ($new_description) {
        $this->description = $new_description;
    }

    function get_sourceurl( ) {
        return $this->sourceurl;
    }

    function set_sourceurl ($new_sourceurl) {
        $this->sourceurl = $new_sourceurl;
    }
}

//Create DOM from URL or file
$html = file_get_html('http://www.uvm.cl/noticias_mas.shtml');

$parsedNews = array();

//Find all news items.
foreach($html->find('#cont2 p') as $element) {

    $newItem = new News;

    //Parse the news item's thumbnail image.
    foreach ($element->find('img') as $image) {
        $newItem->set_image($image->src);
        //echo $newItem->get_image() . "<br />";
    }

    //Parse the news item's post date.
    foreach ($element->find('span.fechanoticia') as $fecha) {
        $newItem->set_fechanoticia($fecha->innertext);
        //echo $newItem->get_fechanoticia() . "<br />";
    }

    //Parse the news item's title.
    foreach ($element->find('a') as $title) {
        $newItem->set_title($title->innertext);
        //echo $newItem->get_title() . "<br />";
    }

    //Parse the news item's source URL link.
    foreach ($element->find('a') as $sourceurl) {
        $newItem->set_sourceurl("http://www.uvm.cl/" . $sourceurl->href);
    }

    //Parse the news items' description text.
    foreach ($element->find('a') as $link) {
        $link->outertext = '';
    }

    foreach ($element->find('span') as $link) {
        $link->outertext = '';
    }

    foreach ($element->find('img') as $link) {
        $link->outertext = '';
    }

    echo $element->innertext;

} 

?>

InformationsquelleAutor Only Bolivian Here

Verwenden Sie die innertext statt outertext

    foreach ($element->find('a') as $sourceurl) {
    echo $sourceurl->innertext . "<br />";
    }

InformationsquelleAutor Paul Dessert

Schreibe einen Kommentar

Du musst angemeldet sein, um einen Kommentar abzugeben.