I have used the following little bit of code to find all links on a page (home.php) and echoed them as URLs. It works fine, but how do I pass the results to a new variable? If I create a new variable:
$myvariable ="$element->href";
This only echos the last result of many.
// Create DOM from URL or file
$html = file_get_html('http://www.somewebsite.xxx/include/home.php');
foreach($html->find('a') as $element)
echo $element->href . '<br>';
Concatenate with a String Operator:
$myvar = '';
foreach($html->find('a') as $element) {
$myvar .= $element->href . '<br>';
}
Or use an Array:
foreach($html->find('a') as $element) {
$myvar[] = $element->href; // removed <br> for implode, you can add it back
}
// if you want the array as one string
$myvar = implode('<br>', $myvar);
Use an array:
// Create DOM from URL or file
$html = file_get_html('http://www.somewebsite.xxx/include/home.php');
$urls = array();
foreach($html->find('a') as $element) {
$urls[] = $element->href;
}
print_r($urls);
You could use an Array to hold the values of all the Links from that Page in Question. In the End, the Array is the Variable you are looking for. Here's how:
<?php
//USE THE HTML DOM PARSER TO PARSE ALL THE HTML DATA ON THE PAGE: $page
$page = 'http://www.somewebsite.xxx/include/home.php';
$html = file_get_html($page);
// LOOPING THROUGH THE DOM ELEMENTS SELECT ONLY THE <a> TAGS
// AND BUNDLE THEM INTO AN ARRAY...
// THE ARRAY NOW FORMS THE VARIABLE YOU HAD EXPECTED TO CREATE..
$arrAnchors = array(); // INITIALIZE $arrAnchors TO AN EMPTY ARRAY...
foreach($html->find('a') as $element) {
// PUSH ALL THE ANCHOR'S HREF ATTRIBUTES (URLs) INTO THE $arrAnchors ARRAY
$arrAnchors[] = $element->href . '<br>';
}
// NOW TRY TO DUMP THE CONTENT OF YOUR $arrAnchors....
var_dump($arrAnchors); // DISPLAYS A NUMERICALLY INDEXED ARRAY OF LINKS ON THE PAGE: $page
Related
I want to get the link and scrape its content but I can';t event reach there. What's wrong with my nested selector?
my php
$dom = file_get_html('http://mojim.com/%E5%BF%83%E8%B7%B3.html?t3');
$tables = $dom->find('.iB');
$firstRow = $tables->find('tr',1)->find('td',4);
foreach ($firstRow as $value) {
echo $value;
}
?>
here is how the DOM look like
You just have a problem on pointing/traversing the correct element.
Example:
$dom = file_get_html('http://mojim.com/%E5%BF%83%E8%B7%B3.html?t3');
$firstRow = $dom->find('table.iB', 0)->find('tr', 1)->find('td', 3);
$link = $firstRow->find('a', 0);
echo $link->href . '<br/>' . $link->title;
Should output:
/twy100015x34x8.htm
心跳 歌詞 王力宏
I am using Simple HTML Dom, trying to get strings from a website. When I print out $title[0] within the function it shows just one string, but when I safe it in the return array and print out the return value, I receive a never ending text with RECURSION.
I don't understand why it would work with the second variable $oTitle.
<?php
include 'scripts/simple_html_dom.php';
function getDetails($id) {
$url = "http://www.something.com";
$html = file_get_html ( $url );
$title = $html->find('span[itemprop=name]');
print_r($title[0] . PHP_EOL); //prints out the correct title
$oTitle = "Something"; //there is also code for this variable but it works as it should
$details = array("Title" => $title[0], "Original Title" => $oTitle);
return $details;
flush ();
}
$values = getDetails($number);
print_r($values); //code breakes here
?>
Take a look at this page: http://simplehtmldom.sourceforge.net/
As I can see, you're using this parser.
In order to get HTML content you should use something like this:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
In order to drop content, you should use something like this:
// Dump contents (without tags) from HTML
echo file_get_html('http://www.google.com/')->plaintext;
Try this code:
<?php
include 'simple_html_dom.php';
function getDetails() {
$url = "http://www.godaddy.com";
$html = file_get_html ( $url );
$title = getTitle($url);
echo $title; //prints out the correct title
$oTitle = "Something"; //there is also code for this variable but it works as it should
$details = array("Title" => $title, "Original Title" => $oTitle);
return $details;
flush ();
}
function getTitle($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
return $title[1];
}
}
$values = getDetails();
print_r($values); //code breakes here
?>
Hello ,
im using the following code to Retrieve the DOM from URL
ind all "A" tags and print their HREFs
Now my output is contain "A" i dont want its my out is here
http://trend.remal.com/parsing.php
some elements duplicated ,
i need to clear my out to be only "A" that include https://twitter.com/$namehere
as you can see i have 2 kind of urls i need only twitter url and avoid duplicate
any tips to adjust the code
<?php
include('simple_html_dom.php');
$html = file_get_html('http://tweepar.com/sa/1/');
foreach($html->find('a') as $e)
echo $e->href . '<br>';
?>
$urls = array();
foreach ( $html->find('a') as $e )
{
// If it's a twitter link
if ( strpos($e->href, '://twitter.com/') !== false )
{
// and we don't have it in the array yet
if ( ! in_array($e->href, $urls) )
{
// add it to our array
$urls[] = $e->href;
}
}
}
echo implode('<br>', $urls);
Here are some references from the PHP docs:
strpos
in_array
implode
I'd like to get to all the a href links within the html string and convert all of the links as follows:
<a href='www.google.com'>Google</a>
Would change to look like this...
<a href='www.mysite.com/link.php?URL=www.google.com'>Google</a>
Can anyone suggest how I do this?
<?php
require_once('simple_html_dom.php');
// load the class
$html = new simple_html_dom();
// load the entire string containing everything user entered here
$string = "<html><body><base href='http://www.site.biz/clients/g/'><a href='www.google.co.uk'>Google</a><a href='http://www.yahoo.co.uk'>Yahoo</a></body></html>";
$return = $html->load($string);
$links = $html->find('a');
foreach ($links as $link)
{
var_dump($link);
}
?>
Have you tried something like
$links = $html->find('a');
foreach ($links as $link)
{
if(isset($link->href))
{
$link->href = 'www.mysite.com/link.php?URL=' . $link->href;
}
}
$newHTML = $html->save();
// $newHTML now contains the modified HTML
Is it possible to use a foreach loop to scrape multiple URL's from an array? I've been trying but for some reason it will only pull from the first URL in the array and the show the results.
include_once('../../simple_html_dom.php');
$link = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($link as $links) {
function scraping_IMDB($links) {
// create HTML DOM
$html = file_get_html($links);
$values = array();
foreach($html->find('input') as $element) {
$values[$element->id=='ASIN'] = $element->value; }
// get title
$ret['ASIN'] = end($values);
// get rating
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] =$html->find('b[class="priceLarge"]', 0)->innertext;
// clean up memory
//$html->clear();
// unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
$ret = scraping_IMDB($links);
foreach($ret as $k=>$v)
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
Here is the code since the comment part didn't work. :) It's very dirty because I just edited one of the examples to play with it to see if I could get it to do what I wanted.
include_once('../../simple_html_dom.php');
function scraping_IMDB($links) {
// create HTML DOM
$html = file_get_html($links);
// What is this spaghetti code good for?
/*
$values = array();
foreach($html->find('input') as $element) {
$values[$element->id=='ASIN'] = $element->value;
}
// get title
$ret['ASIN'] = end($values);
*/
foreach($html->find('input') as $element) {
if($element->id == 'ASIN') {
$ret['ASIN'] = $element->value;
}
}
// Our you could use the following instead of the whole foreach loop above
//
// $ret['ASIN'] = $html->find('input[id="ASIN"]', 0)->value;
//
// if the 0 means, return first found or something similar,
// I just had a look at Amazons source code, and it contains
// 2 HTML tags with id='ASIN'. If they were following html-regulations
// then there should only be ONE element with a specific id.
// get rating
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;
// clean up memory
//$html->clear();
// unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
$links = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
This should do the trick
I have renamed the array to 'links' instead of 'link'. It's an array of links, containing link(s), therefore, foreach($link as $links) seemed wrong, and I changed it to foreach($links as $link)
I really need to ask this question as it will answer way more questions after the world reads this thread. What if ... you used articles like the simple html dom site.
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;
return $ret;
}
$links = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
what if its $articles?
$articles[] = $item;
}
//print_r($articles);
$links = array (
'http://link1.com',
'http://link2.com',
'http://link3.com'
);
what would this area look like?
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
Ive seen this multiple links all over stackoverflow for past 2 years, and I still cannot figure it out. Would be great to get the basic handle on it to how the simple html dom examples are.
thx.
First time postin im sure I broke a bunch of rules and didnt do the code section right. I just had to ask this question badly.