I want to parse span innertext using simple_html_dom in php - php

<span class="contact-seller-name">Enda</span>
Now I want to echo 'Enda' inside this span tag using php
Here's my php code
$url="http://website.example.com";
$html = file_get_html( $url );
$value = $html->find('span.contact-seller-name');
echo $value->innertext;

From their documentation it looks like find returns an array of found values matching filter parameters:
From:
http://simplehtmldom.sourceforge.net/
Code:
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
They also provide another example for getting a specific element:
$html->find('div[id=hello]', 0)->innertext = 'foo';
So my guess would be something like this will get you want you desire:
$value = $html->find('span.contact-seller-name', 0);
echo $value->innertext;
By adding the 0 as a parameter it returns the first found instance of that filter.
Take a look at their API here:
http://simplehtmldom.sourceforge.net/manual_api.htm
It describe what the find method returns (an array of element objects or element object if the second parameter is defined)
Then using any of the provided methods for the element object you can get the desired text.
Full working example tested on a live site:
$url = "http://fleeceandthankyou.org/";
$html = file_get_html($url);
$value = $html->find('span.givecamp-header-wide', 0);
//If it can't find the element, throw an error
try
{
echo $value->innertext;
}
catch (Exception $e)
{
echo "Couldn't access magic method: " . $e->getMessage();
}

Related

Question about using simple html dom parser to store HTML tags as objects

I am building a web scraper using the simple HTML DOM parser. However, I ran into some issues figuring out how to store HTML elements on a web page as objects. I would like to take an input URL, and turn all the HTML elements like tags, divs, fields, etc. and turn them into an object that gets spit out onto a page. I have written some code that currently works when I type in a URL, but the output is not what I am trying to achieve. Below, I have attached the code that I have worked out already, and I am seeking to find a way in which I could achieve what I am trying to do.
I have tried finding all images and links as well as creating a DOM object. I can't seem to figure out how to convert these elements into objects that I can use to learn more about a website, and possibly store that data into a database.
<?php
require('simple_html_dom.php');
// Create DOM from URL or file
$url = $_POST["url"];
$html = file_get_html($url);
echo $html;
// Find all images
$element = new simple_html_dom();
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
$element = new simple_html_dom();
foreach($html->find('a') as $element)
echo $element->href . '<br>';
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a URL
$html->load_file($url);
echo $html;
?>
I am expecting an output of objects, but I am instead getting an actual output of images and links on a web page.
<?php
require('simple_html_dom.php');
// Create DOM from URL or file
// $url = $_POST["url"];
$url = 'Your-Url'; // Your url: 'www.example.com'
$html = file_get_html($url);
// Find all images
$images = []; //create empty images array
foreach($html->find('img') as $element){
$images[] = $element->src . '<br>'; //Store the found elements in the images array
}
echo '<pre>Output $images: '; var_dump($images); echo '</pre>'; //An output from the images array
// Find all links
$links = []; //create empty images array
foreach($html->find('a') as $element){
$links[] = $element->href . '<br>'; //Store the found elements in the links array
}
echo '<pre>Output $links: '; var_dump($links); echo '</pre>'; //An output from the links array
The echo's display the arrays filled with 'image' and 'a' tags value's from your page

array_unique() in php simple html dom

I wrote the code blow to get all unique links from a url:
include_once ('simple_html_dom.php');
$html = file_get_html('http://www.example.com');
foreach($html->find('a') as $element){
$input = array($element->href = $element->href . '<br />');
print_r(array_unique($input));}
but I really can't understand why it shows the duplicated links too!
is there any problem with the function array_unique and simple html dom?
and there's another thing I guess is related to the problem: when you execute this you see all of the link that it extracted are in one key I mean this :
array(key => all values)
Is there any one who can solve this?
I believe you want it more like this:
$temp = array();
foreach($html->find('a') as $element) {
$temp[] = $element->href;
}
echo '<pre>' . print_r(array_unique($temp), true) . '</pre>';

change variable with GET method

I have a page test.php in which I have a list of names:
name1: 992345
name2: 332345
name3: 558645
name4: 434544
In another page test1.php?id=name2 and the result should be:
332345
I've tried this PHP code:
<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("/test.php");
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*#".$_GET["id"]."");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
?>
I need to be able to change the name with GET PHP method in test1.pdp?id=name4
The result should be different now.
434544
is there another way, becose mine won't work?
Here is another way to do it.
<?php
libxml_use_internal_errors(true);
/* file function reads your text file into an array. */
$doc = file("test.php");
$id = $_GET["id"];
/* Show your array. You can remove this part after you
* are sure your text file is read correct.*/
echo "Seeking id: $id<br>";
echo "Elements:<pre>";
print_r($doc);
echo "</pre>";
/* this part is searching for the get variable. */
if (!is_null($doc)) {
foreach ($doc as $line) {
if(strpos($line,$id) !== false){
$search = $id.": ";
$replace = '';
echo str_replace($search, $replace, $line);
}
}
} else {
echo "No elements.";
}
?>
There is a completely different way to do this, using PHP combined with JavaScript (not sure if that's what you're after and if it can work with your app, but I'm going to write it). You can change your test.php to read the GET parameter (it can be POST as well, you'll see), and according to that, output only the desired value, probably from the associative array you have hard-coded in there. The JavaScript approach will be different and it would involve making a single AJAX call instead of DOM traversing using PHP.
So, in short: AJAX call to test.php, which then output the desired value based on the GET or POST parameter.
jQuery AJAX here; native JS tutorial here.
Just let me know if this won't work for your app, and I'll delete my answer.

when parsing html, check if element is present

Im parsing html from some a page, to get a list of the outgoing, i want to split them in two - the ones with the rel="nofollow" / rel="nofollow me" / rel="me nofollow" element and the ones with with out those expressions.
At the moment im using the code bellow parsed using - PHP Simple HTML DOM Parser
$html = file_get_html("$url");
foreach($html->find('a') as $element) {
echo $element->href; // THE LINK
}
but im not quite sure how to implement it, any ideas ?
Try using something like this :
$html = file_get_html("$url");
// Creating array for storing links
$arrayLinks = array(
"nofollow" => array(),
"others" => array()
);
foreach($html->find('a') as $element) {
// Search for "nofollow" expression with no case-sensitive (i flag)
if(preg_match('#nofollow#i', $element->rel)) {
$arrayLinks["nofollow"][] = $element->href;
}
else {
$arrayLinks["others"][] = $element->href;
}
}
// Display the array
echo "<pre>";
print_r($arrayLinks);
echo "</pre>";
Do a regexp on $element->rel I guess

get specified url from webpage using simplehtmldom

i am trying to build simple php crawler
for this purpose
i am getting constants of webpage using
http://simplehtmldom.sourceforge.net/
after getting page data i get page as bellow
include('simplehtmldom/simple_html_dom.php');
$html = file_get_html('http://www.mypage.com');
foreach($html->find('a') as $e)
echo $e->href . '<br>';
this works perfectly,and print all links on that page.
i only want to get some url like
/view.php?view=open&id=
i have wirtten function for this purpose
function starts_text_with($s, $prefix){
return strpos($s, $prefix) === 0;
}
and use this function as
include('simplehtmldom/simple_html_dom.php');
$html = file_get_html('http://www.mypage.com');
foreach($html->find('a') as $e) {
if (starts_text_with($e->href, "/view.php?view=open&id=")))
echo $e->href . '<br>';
}
but nothing return.
i hope you understand what i need.
i need to print only url which match that criteria.
Thanks
include('simplehtmldom/simple_html_dom.php');
$html = file_get_html('http://www.mypage.com');
foreach($html->find('a') as $e) {
if (preg_match($e->href, "view.php?view=open&id="))
echo $e->href . '<br>';
}
try this once.
refer preg_match

Categories