Below is the code I am using.
It reads links from a textarea, and then gets the source code and finally filters the meta tags. However it only displays the last element in the array.
So if for example I put 3 websites into the textarea, it will only read the last one, the others are just shown as blank.
Have spent hours trying this, please help.
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
if(isset($_POST['url'])){
$url = $_POST['url'];
$url = explode("\n",$url);
print_r($url);
for($counter = 0; $counter < count($url); $counter++){
$html = file_get_contents_curl($url[$counter]); // PASSING LAST VALUE OF ARRAY
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++){
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
if($meta->getAttribute('name') == 'keywords')
$keywords = $meta->getAttribute('content');
}
print
('
<fieldset>
<table>
<legend><b>URL: </b>'.$url[$counter].'</legend>
<tr>
<td><b>Title:</b></td><td>'.$title.'</td>
</tr>
<tr>
<td><b>Description:</b></td><td>'.$description.'</td>
</tr>
<tr>
<td><b>Keywords:</b></td><td>'.$keywords.'</td>
</tr>
</table>
</fieldset><br />
');
}
}
This was an annoying little bug to find - but here is the (ridiculously simple) solution:
Your URLs are getting white space added to them, for all but the last URL therefore you'll need to trim it, you can do the following:
curl_setopt($ch, CURLOPT_URL, trim($url));
If available, you could have possibly just used file_get_contents() (still requires you trimming the URL).
The second problem is that if there is no meta data then the old variables are used (from the previous loop) so just before the end of your main loop, after your print() add the following:
unset($title,$description,$keywords);
Related
i am reading the html source code of instagram post by using the CURL. I am able to do this on localhost but when i test the code on live domain then meta tags with og property like og:type is missing, it only showing at localhost.
This is the complete code.
<?php
function get_domain($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : $pieces['path'];
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i',
$domain, $regs)) {
return $regs['domain'];
}
return false;
}
//run curl here and get html code of instagram post page
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
//check instagram url
function checkinstaurl($urlhere) {
//remove white space
$urlhere = trim($urlhere);
$urlhere = htmlspecialchars($urlhere);
///remove white space
if (get_domain($urlhere) == "instagram.com") {
//getting the meta tag data
$html = file_get_contents_curl($urlhere);
//parsing begins here:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
//get and display what you need:
$title = $nodes->item(0)->nodeValue;
$metas = $doc->getElementsByTagName('meta');
$mediatype = null;
$description = null;
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('property') == 'og:type')
$mediatype = $meta -> getAttribute('content');
if($mediatype == 'video') {
if($meta->getAttribute('property') == 'og:video')
$description = $meta -> getAttribute('content');
} else {
if($meta->getAttribute('property') == 'og:image')
$description = $meta -> getAttribute('content');
$mediatype = 'photo';
}
} // for loop statement
$out['mediatype'] = $mediatype;
$out['descriptionc'] = $description;
return $out;
//getting the meta tag data
}
}
/*output*/
$igurl = 'https://www.instagram.com/p/COf0dN0M8pU/';
$output = checkinstaurl($igurl);
echo "<pre>";
print_r($output);
?>
This above code, At Localhost returns the complete html with meta tags but on live domain meta tags with og property is missing.
I have this error: Object of class DOMDocument could not be converted to string
I'm trying to parse web page to get text inside a div
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
$dom = new DOMDocument();
$dom->loadHTML($html);
$table = $dom->getElementById('mostra')> textContent; //DOMElement
echo $table;
This is html element:
<div id="mostra">Hello<img src="file.png"></div>
I want to print Hello
How can i solve it ?
Thanks a lot and sorry for my english
function string_between_two_string($str, $starting_word, $ending_word) {
$subtring_start = strpos($str, $starting_word);
$subtring_start += strlen($starting_word);
$size = strpos($str, $ending_word, $subtring_start) - $subtring_start;
return substr($str, $subtring_start, $size);
}
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
$table = string_between_two_string($html, '<div id="mostra">', '<img src="file.png"></div>');
echo $table;
Try to use this function to find text between two element
Im stuck trying to make PHP output the content within a set tag id.
Example: <div id="myID" class="section">...Content to output...</div>
But i only manage to make PHP output from the tag.
Example: <div>...Content...</div>
Here is the code ive used:
$theTag = $_REQUEST['tag'];
$url = $_REQUEST['url'];
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$output = curl_exec($curl);
curl_close($curl);
$DOM = new DOMDocument;
$DOM->loadHTML( $output);
//get all H1
$items = $DOM->getElementsByTagName("$theTag");
//display all H1 text
for ($i = 0; $i < $items->length; $i++)
echo $items->item($i)->nodeValue . "<br/>";
I would use a query similiar to http://example.com/get-content?url=http://example.com/page.file&tag=myID
How can i make this work ?
Thanks!
This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Screen scapingin in php using file_get_contents
Can anyone help me.. I am trying to scrape Hotel reviews from LateRooms.com dont tell me its a bad idea because I already have permission as an affiliate
My code:
<?php
header('content-type: text/plain');
$contents = file_get_contents('http://www.laterooms.com/en/hotel-reviews/238902_the-westfield-bb-sandown.aspx');
$contents = preg_replace('/\s(1,)/', ' ', $contents);
print $contents . "\n";
$records = preg_split('/<div id="review/', $contents);
for ($ix = 1; $ix < count($records); $ix++) {
$tmp = $records[$ix];
preg_match('/id="review"/', $tmp, $match_reviews);
print_r($match_reviews);
exit();
}
?>
This works really well the only problem is that It pulls in the whole page of code and doesnt match the div id 'review'
Thanks in advance
function file_get_contents_curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
function DOMinnerHTML($element){
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
$url = 'http://www.laterooms.com/en/hotel-reviews/238902_the-westfield-bb-sandown.aspx';
$html = file_get_contents_curl($url);
//parsing begins here:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$div_elements = $doc->getElementsByTagName('div');
if ($div_elements->length <> 0){
foreach ($div_elements as $div_element) {
if ($div_element->getAttribute('class') == 'review newReview'){
$reviews[] = DOMinnerHTML($div_element);
}
}
}
print_r($reviews);
Try this, it will return all reviews. You can refine the content as per your requirement.
I am trying to count the number of tweets with a given a hashtag using PHP.
This piece of source code was given by someone on StackOverflow.
Do I need to add any libraries or change any settings for the function to work?
because when I run this code, it just gives me a blank page.
<?php
global $total, $hashtag;
//$hashtag = '#supportvisitbogor2011';
$hashtag = '#australialovesjustin';
$total = 0;
function getTweets($hash_tag, $page) {
global $total, $hashtag;
$url = 'http://search.twitter.com/search.json?q='.urlencode($hash_tag).'&';
$url .= 'page='.$page;
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
$json = curl_exec ($ch);
curl_close ($ch);
//echo "<pre>";
//$json_decode = json_decode($json);
//print_r($json_decode->results);
$json_decode = json_decode($json);
$total += count($json_decode->results);
if($json_decode->next_page){
$temp = explode("&",$json_decode->next_page);
$p = explode("=",$temp[0]);
getTweets($hashtag,$p[1]);
}
}
getTweets($hashtag,1);
echo $total;
?>
Read these articles
https://dev.twitter.com/docs/using-search
https://dev.twitter.com/docs/api/1/get/search
and try with
developer console