I need to grab data from html of a remote url

I need to grab data from html of a remote url - php

I need a script that accepts this ul:
<ul id="activitylist">
<li class="activitybit forum_thread">
<div class="avatar"> <img alt="secret team's Avatar" src="images/misc/unknown.gif" title="secret team's Avatar"> </div>
<div class="content hasavatar">
<div class="datetime"> <span class="date">Today, <span class="time">11:25pm</span></span> </div>
<div class="title"> <a class="username" href="member.php/436070-secret-team">secret team</a> started a thread 'Allow [VIDEO] Code' missing in settings </div>
<div class="views">0 replies | 0 view(s)</div>
</li>
</ul>
There are 10 to 15 child li in one ul. I need thread name of every child li where thread has 0 replies. I posted one example li above. So for that example I need this text:
'Allow [VIDEO] Code' missing in settings
where this div has 0 replies as a text:
<div class="views">0 replies | 0 view(s)</div>
I have this sample code but it is not working correctly.
<?php
$request_url = 'https://www.vbulletin.com/forum/activity.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $request_url); // The url to get links from
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone
$result = curl_exec($ch);
$sPattern = "/<li class=\"activitybit forum_thread\">(.*?)<\/li>/s";
preg_match_all($sPattern, $result, $parts);
$links = $parts[1];
foreach ($links as $link) {
if (stripos($link, "0 replies") !== false) {
echo $link . "<br>";
}
}
curl_close($ch);
?>

Here is a regex that will parse any kind of HTML:
$regex = new DOMDocument;
$regex->loadHTML($html);
Now serious. DOMDocument has parsed all of your HTML. You can now use these and these functions to walk over tags and extract their attribute and contents. But it is much easier to use a companion class called DOMXPath:
$xpath = new DOMXpath($regex);
foreach ($xpath->query("//ul[#id='activitylist']/li") as $li) {
$view = $xpath->query(".//div[#class='views']", $li)->item(0);
$link = $xpath->query(".//div[#class='title']/a", $li)->item(1);
if (preg_match("/0 replies/", $view->nodeValue)) {
echo $link->nodeValue . " (" . $link->getAttribute("href") . ")\n";
}
}
This will output few warnings about your HTML not being perfect plus this:
'Allow [VIDEO] Code' missing in settings (showthread.php/415403-Allow-VIDEO-Code-missing-in-settings)
You can read more about using Regex PHP to parse HTML here. A comprehensive list of XPath examples is available here.

Related

Getting an image from a facebook post on a public page

I'm trying to show a facebook feed on my website, which is working. But I only managed to show the text of the post, not the image attached to it (or if possible, if there are multiple images, only the first one or biggest one).
I tried looking here for the correct name to get it using the API
This is my code now (which shows facebook posts in an owl carousel):
function fetchUrl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
// You may need to add the line below
// curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
$feedData = curl_exec($ch);
curl_close($ch);
return $feedData;
}
//App Info, needed for Auth
$app_id = "1230330267012270";
$app_secret = "secret";
//Retrieve auth token
$authToken = fetchUrl("https://graph.facebook.com/oauth/access_token?grant_type=client_credentials&client_id=1230330267012270&client_secret=secret");
$json_object = fetchUrl("https://graph.facebook.com/267007566742236/feed?{$authToken}");
$feedarray = json_decode($json_object);
foreach ( $feedarray->data as $feed_data )
{
if($feed_data->message != ''){
$facebookfeed .= '
<div class="item">
<div class="product-item">
<img class="img-responsive" src="images/siteimages/imgfacebook.jpg" alt="">
<h4 class="product-title">'.$feed_data->name.'</h4>
<p class="product-desc">'.$feed_data->message.'</p>
<p>'.$feed_data->story.'</p>
<img src="'.$feed_data->picture.'">
<p>Lees meer</p>
</div><!-- Product item end -->
</div><!-- Item 1 end -->';
}
}
echo $facebookfeed;
Looking at the Facebook documentation I thought $feed_data->picture would work, but it returns nothing.

To try to improve performance on mobile networks, Nodes and Edges in v2.4 requires that you explicitly request the field(s) you need for your GET requests. For example, GET /v2.4/me/feed no longer includes likes and comments by default, but GET /v2.4/me/feed?fields=comments,likes will return the data.
Source: https://developers.facebook.com/docs/apps/changelog#v2_4

Error Using $this when not in object context

First post here so apologies in advance if this is an incorrect format. I am working with the Instagram API to pull images. The Instagram API only returns 1 page of images at a time, but offers pagination and next_url to grab the next page of images. When I use the function fetchInstagramAPI below, to grab only the first page, the php code works fine.
When I attempt to use the loopPages function together with the fetchInstagramAPI function, to try and grab all pages at once, I receive the error "Using $this when not in object context". Any idea? Thank you for the help in advance.
Function fetchInstagramAPI gets our data
<?php
function fetchInstagramAPI($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$contents = curl_exec($ch);
curl_close($ch);
return json_decode($contents);
}
Function loopPages uses pagination and next_url to grab all pages of images
function loopPages($url){
$gotAllResults = false;
$results = array();
while(!$gotAllResults) {
$result = $this->fetchInstagramAPI($url);
$results[] = $result;
if (!property_exists($result->pagination, 'next_url')) {
$gotAllResults = true;
} else {
$url = $result->pagination->next_url;
}
}
return $results;
}
This pulls, parses, then displays the images in a browser
$all_url = 'https://api.instagram.com/v1/users/{$userid}/media/recent/?client_id={$clientid}';
$media = loopPages($all_url);
foreach ($media->data as $post): ?>
<!-- Renders images. #Options (thumbnail, low_resoulution, standard_resolution) -->
<a class="group" rel="group1" href="<?= $post->images->standard_resolution->url ?>"><img src="<?= $post->images->thumbnail->url ?>"></a>
<?php endforeach ?>

In PHP and many object oriented languages $this is a reference to the current object (or the calling object). Because your code don't seem to be in any class $this doesn't exists. Check this link for PHP classes and objects.
Since you have just defined your functions in the file you can try calling the function with $result = fetchInstagramAPI($url); (without $this).
edit:
For foreach check if $media->data is in fact an array and try another syntax which i think is easier to read.
edit2:
Since you now know how your $media looks like you can wrap around another foreach loop that will iterate through the pages:
foreach ($media as $page){
foreach ($page->data as $post) {
echo '<!-- Renders images. #Options (thumbnail, low_resoulution, standard_resolution) -->';
echo '<a class="group" rel="group1" href="' . $post->images->standard_resolution->url . '"><img src="' . $post->images->thumbnail->url . '"></a>';
}
}

PHP Simple HTML DOM counts wrong the number of elements

Using this code I want to count the number of elements (dt) with class "level3" in certain node:
include_once('simple_html_dom.php');
ini_set("memory_limit", "-1");
ini_set('max_execution_time', 1200);
function parseInit($url) {
$ch = curl_init();
$timeout = 0;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$data = parseInit("https://www.smile-dental.de/index.php");
$html = new simple_html_dom();
$html = $html->load($data);
$struct = $html->find("dt.level1", 0)->next_sibling()->find("dt.level2", 0)->next_sibling()->find("dt.level3");
echo count($struct);
$html->clear();
unset($html);
But as a result I've got such problem. Real result should be 2, but I get 53 (total count of the DT elements with class "level3" into the first DT node with class "level1" ). Could you help me and explain what the problem is?
Thanks in advance!
---EDIT---
Generally, I want to create hierarchical structure of links (of left navigation bar). I wrote such function. But it works wrong, and maybe because of situation which was written by me above. But maybe there also other problems besides this one in the code.
function get_links($struct) {
static $iter = 1;
$nav_left_links = $struct->find("dt.level".$iter);
echo "<ul>";
foreach ($nav_left_links as $links) {
echo "<li>".$links->find("a", 0)->href;
echo str_pad('',4096)."\n";
ob_flush();
flush();
usleep(500000);
$iter++;
if ($links->next_sibling() && count($links->next_sibling()->find("dt")) > 0) {
get_links($links->next_sibling());
} else {
$iter--;
if ($key == count($nav_left_links)) {
break;
} else {
continue;
}
}
echo "</li>";
}
echo "</ul>";
$iter--;
}
$data = parseInit("https://www.smile-dental.de/index.php");
$html = new simple_html_dom();
$html = $html->load($data);
$struct = $html->find(".mod_vertical_dropmenu_142_inner", 0);
get_links($struct);
$html->clear();
unset($html);
Or maybe if somebody knows how to rewrite this code without PHP Simple HTML DOM, using classic methods for parsing, I would be very grateful.

Unfortunately, it looks like you have uncovered a bug. I did some experiments, and even after correcting the validation errors, simple-html-dom wasn't able to traverse the dl, dt, and dd elements properly. I did get it to work when I used a regex to convert all the dl elements to ul, and the dd and dt elements to li, though:
result of $html->find("li.level1", 1)->find("li.level2", 1)->find("li.level3");
<li class="level3 off-nav-321-8120 notparent first"><span class="outer"> <span class="inner"> <span>Pro-Seal Versiegeler</span> </span> </span></li>
<li class="level3 off-nav-321-8120 notparent first"></li>
<li class="level3 off-nav-321-8122 notparent last"><span class="outer"> <span class="inner"> <span>Pro-Seal L.E.D. Versiegeler</span> </span> </span></li>
<li class="level3 off-nav-321-8122 notparent last"></li>

Trying to create Lightbox gallery for only one foreach element

Okay, so here's what I'm trying to achieve!
I have a PHP script that I pull information into a foreach() statement from an API/JSON setup.
This is some of my code:
$ch = curl_init();
$category=$_GET['category'];
$url="http://www.someurl.com/api/listings?token=f716909f5d644fe3702be5c7895aa34e&group_id=10046&species=".$category;
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: application/json',
'X-some-API-Key: f716909f5d644fe3702be5c7895aa34e',
));
$json = json_decode(curl_exec($ch), true);
// Functions relating to the Echo Code
foreach($json['listings'] as $listing)
{
$short_personality=substr($listing['personality'],0,500);
$peturl="http://www.someurl.com/?variable=".$listing['id'];
$medium_photo=$listing['photos'][0]['large_340'];
$gender_class=strtolower($listing['gender']);
$breed_class=strtolower($listing['species']);
$name=($listing['name']);
Then, I'm creating the image to display with the following code:
// Echo the output
echo'<div class="animal">
<div class="animal-image">';
if ($listing['photos'] > 1) {
echo '<a class="size-thumbnail thickbox" rel="gallery" href="'.$medium_photo.'">
<img src="'.$medium_photo.'" class="image-with-border" alt="">
<div class="border" style="width: 340px; height: 340px;">
<div class="open"></div>
</div>
</a>';
}
else {
echo '<a class="size-thumbnail thickbox" href="'.$medium_photo.'">
<img src="'.$medium_photo.'" class="image-with-border" alt="">
<div class="border" style="width: 340px; height: 340px;">
<div class="open"></div>
</div>
</a>';
}
So, for example, the API pulls in 25 items that are all formatted into their individual section, each section has it's own initial photo. When I click on that photo, the lightbox loads.
BUT, it loads the lightbox for ALL images in the API call, so I'm getting 144 images available in the lightbox.
Is there a way to limit the photos in the lightbox for each item that's being retrieved?
Any help would be appreciated.

this answer is based on my assumption that you use thickbox plugin (if thats the case please update your question details :D )
if you want every image to be "alone" and not in a gallery then every link must have a different rel="name-of-individual-image-gallery-whatever"
so you can do something like
// Functions relating to the Echo Code
// add a counter
$inc=0;
foreach($json['listings'] as $listing)
{
$short_personality=substr($listing['personality'],0,500);
$peturl="http://www.someurl.com/?variable=".$listing['id'];
$medium_photo=$listing['photos'][0]['large_340'];
$gender_class=strtolower($listing['gender']);
$breed_class=strtolower($listing['species']);
$name=($listing['name']);
// add this
$unique_gallery_name="individual-gallery-".$inc;
$inc++;
// the rest of your code...
// .
// .
// .
and then add that rel attribute to your output code
<a class="size-thumbnail thickbox" rel="'.$unique_gallery_name.'"
so each link will have one rel like this rel="individual-gallery-0", the next link rel="individual-gallery-1", the next link rel="individual-gallery-2" ... and so on. The result will be every image to be in a "unique" gallery, "alone"

php proDOM parsing error

I am using the following code for parsing dom document but at the end I get the error
"google.ac" is null or not an object
line 402
char 1
What I guess, line 402 contains tag and a lot of ";",
How can I fix this?
<?php
//$ch = curl_init("http://images.google.com/images?q=books&tbm=isch/");
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://images.google.com/images?q=books&tbm=isch/");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
$dom->loadHTML($data);
//#$dom->saveHTMLFile('newfolder/abc.html')
$dom->loadHTML('$data');
// find all ul
$list = $dom->getElementsByTagName('ul');
// get few list items
$rows = $list->item(30)->getElementsByTagName('li');
// get anchors from the table
$links = $list->item(30)->getElementsByTagName('a');
foreach ($links as $link) {
echo "<fieldset>";
$links = $link->getElementsByAttribute('imgurl');
$dom->saveXML($links);
}
?>

There are a few issues with the code:
You should add the CURL option - CURLOPT_RETURNTRANSFER - in order to capture the output. By default the output is displayed on the browser. Like this: curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);. In the code above, $data will always be TRUE or FALSE (http://www.php.net/manual/en/function.curl-exec.php)
$dom->loadHTML('$data'); is not correct and not required
The method of reading 'li' and 'a' tags might not be correct because $list->item(30) will always point to the 30th element
Anyways, coming to the fixes. I'm not sure if you checked the HTML returned by the CURL request but it seems different from what we discussed in the original post. In other words, the HTML returned by CURL does not contain the required <ul> and <li> elements. It instead contains <td> and <a> elements.
Add-on: I'm not very sure why do HTML for the same page is different when it is seen from the browser and when read from PHP. But here is a reasoning that I think might fit. The page uses JavaScript code that renders some HTML code dynamically on page load. This dynamic HTML can be seen when viewed from the browser but not from PHP. Hence, I assume the <ul> and <li> tags are dynamically generated. Anyways, that isn't of our concern for now.
Therefore, you should modify your code to parse the <a> elements and then read the image URLs. This code snippet might help:
<?php
$ch = curl_init(); // create a new cURL resource
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://images.google.com/images?q=books&tbm=isch/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($ch); // grab URL and pass it to the browser
curl_close($ch);
$dom = new DOMDocument();
#$dom->loadHTML($data); // avoid warnings
$listA = $dom->getElementsByTagName('a'); // read all <a> elements
foreach ($listA as $itemA) { // loop through each <a> element
if ($itemA->hasAttribute('href')) { // check if it has an 'href' attribute
$href = $itemA->getAttribute('href'); // read the value of 'href'
if (preg_match('/^\/imgres\?/', $href)) { // check that 'href' should begin with "/imgres?"
$qryString = substr($href, strpos($href, '?') + 1);
parse_str($qryString, $arrHref); // read the query parameters from 'href' URI
echo '<br>' . $arrHref['imgurl'] . '<br>';
}
}
}
I hope above makes sense. But please note that the above parsing might fail if Google modifies their HTML.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

I need to grab data from html of a remote url - php

Related

Getting an image from a facebook post on a public page

Error Using $this when not in object context

PHP Simple HTML DOM counts wrong the number of elements

Trying to create Lightbox gallery for only one foreach element

php proDOM parsing error

Categories

Resources