DOM to parse Facebook wall - php

I am trying to parse messages from a public Facebook fan page wall, but it returns a blank page.
$source = "http://www.facebook.com/?sk=wall&filter=2";
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument();
$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
libxml_use_internal_errors(FALSE);
$message = $xml->xpath("//span[#class='messageBody']");
return (string)$message[0] . PHP_EOL;

The DOMDocument::loadHTML() method, which you are using, expects the HTML content as a parameter -- and not an URL.
Here, you are trying to interpret your URL as some HTML content -- and not what it links to.
using this method, you might want to try with one that works on a file, or a remonte content, such as DOMDocument::loadHTMLFile().

This is not the right way to fetch data from Facebook, and it's clear that you want to avoid creating a Facebook Application.
But the good news is that you can still use the FQL, try the below query in the Graph API Explorer.
In the below query, we queried the stream table to get the Facebook Developers page's public feeds:
SELECT message
FROM stream
WHERE source_id=19292868552
AND is_hidden = 0
AND filter_key='owner'
It'll return all the "public" feeds of the page. Obviously you may need retrieve more fields to create a meaningful result.
You need to provide a valid access_token to even access public posts. Read more here.

Yet another approach would be to use the JSON from the Graph API
$posts = json_decode(
file_get_contents('https://graph.facebook.com/swagbucks/posts')
);
foreach($posts->data as $post) {
echo $post->message, PHP_EOL;
}

Related

Sending url parameters through file_get_contents returns nothig

I am trying to scrape a website in order to get latitude and longitude for counties in the us(there are 3306 thus why I am trying to do it through code and not manually)
I am using the code below
function GetLatitude($countyName,$stateShortName){
//Create DOM from url
$page = file_get_contents("https://www.mapdevelopers.com/geocode_tool.php?$countyName,$stateShortName");
$doc = new DOMDocument();
$doc->loadHTML($page);
$node = $doc->getElementById("display_lat");
var_dump($doc);
}
GetLatitude("Guilford County","NC");
This returns nothing but if I change the url to get without the parameters like "https://www.mapdevelopers.com/geocode_tool.php" then I can see that $doc now has some information in it but that is not useful because the value I need (latitude) is dependent upon the parameters passed into the url.
How do I solve this issue?
EDIT:
Based on the suggestion to encode the parameters I changed my code to this and now the document contains information but appears as though it is ignoring the parameters
<?
function GetLatitude($countyName,$stateShortName){
$countyName = urlencode($countyName);
$stateShortName = urlencode($stateShortName);
//Create DOM from url
$page = file_get_contents("https://www.mapdevelopers.com/geocode_tool.php?address=$countyName,$stateShortName");
$doc = new DOMDocument();
$doc->loadHTML($page);
$node = $doc->getElementById("display_lat");
var_dump($doc);
}
GetLatitude("Clarke County","AL");
?>
Your issue is that the latitude information etc isn't present on page load, and java script puts it there
You're going to have a hard time trying to run a webpage with JS and scraping it from PHP without something in the middle, maybe re-try this project with something like puppet or phantomjs so you can run your script against a real browser.
Searching the page there is a ajax request to https://www.mapdevelopers.com/data.php
Sending a POST or GET request will give you the response you are looking for

Get attributes from post data in PHP

I'm using laravel to create a simple social network. Users can type # in the post area to get a list of their friends to mention them. Every mention in a link like this (using zurb/tribute from github)
<a type="mention" href="/user/Jordan" title="Jordan">Jordan</a>
Normal links other than mentions won't have type='mention'
Now when I get the post and insert it into the database I need to get a list of users mentioned in the post. I'm looking for links which have the type ='mention' and if there's any I want to get the title of everyone to insert into the notification system. What PHP code do I need to add in this if statement?
if(stristr(request('post'),' type="mention" ')){
}
Aside from using an AST (Abstract Syntax Tree), your best bet would be to either use DOM on PHP Side, e.g.:
$string = '<a type="mention" href="/user/Jordan" title="Jordan">Jordan</a>';
$doc = new \DOMDocument();
$doc->loadHTML($string);
$xpath = new \DOMXPath($doc);
foreach ($xpath->query("//a[#type='mention']") as $a) {
$href = $a->getAttribute('href');
$title = $a->getAttribute('title');
echo sprintf("Found mention of %s with href of %s\n", $title, $href);
}
However, I probably wouldn't be sending the A node back to the server. You should consider working out some way to make it a display-only feature implemented on the browser side, and simply send the "#jordan" string back to the server.

Return external wordpress page as XML Object

I have a CakePHP site for a client, but with the sites blog run on Wordpress (I just redirect to the WP site for the blog). The client now wants a section of the homepage to pull in a snippet from the blog and I am wondering what is the best way to do this. I am currently trying this...
function getPosts($feed_url) {
$content = file_get_contents($feed_url); // get XML string
$feed_object = new xml($content); // load XML string into object
$x = new SimpleXmlElement($content); // load XML string into object
}
getPosts("example.com");
The 'file_get_content' is working great and actually pulling in the html but I cannot get that html into xml. My error message is 'String could not be parsed as XML'. Anyone know the best way to go about this?
You may want to use simplexml_load_string directly.
function getPosts($feed_url) {
$content = file_get_contents($feed_url); // get XML string
$xml = simplexml_load_string($content);
return $xml;
}

how to get page contents

I'm trying to make a recent news like functionality for my site. For this i've made a web crawler and have being able to collect links from a page up till now by doing the following
$dom = new domDocument;
#$dom->loadHTML(file_get_contents($url));
$dom->preserveWhiteSpaces = false;
$linksToStore = $dom->getElementsByTagName('a');
foreach($linksToStore as $tag){
$links[$tag->getAttribute('href')]= $tag->childNodes->item(0)->nodeValue;
}
how can i get contents from the pages pointed by those links related to a particular domain which in my case is 'Medical'??
Use this http://simplehtmldom.sourceforge.net/ library to extract contents from the page. The selector works same as of jQuery, which makes it very familier and efficient to extract the contents.
Also, check this http://davidwalsh.name/php-notifications to know more

How to get post URL out of the Blogger API in PHP

In Short, I am pulling the feed from my blogger using the Zend API in PHP. I need to get the URL that will link to that post in blogger. What is the order of functions I need to call to get that URL.
Right now I am pulling the data using:
$query = new Zend_Gdata_Query('http://www.blogger.com/feeds/MYID/posts/default');
$query->setParam('max-results', "1");
$feed = $gdClient->getFeed($query);
$newestPost = $feed->entry[0];
I can not for the life of me figure out where I have to go from here to get the URL. I can successfully get the Post title using: $newestPost->getTitle() and I can get the body by using $newestPost->getContent()->getText(). I have tried a lot of function calls, even ones in the documentation and most of them error out. I have printed out the entire object to look through it and I can find the data I want (so I know it is there) but the object is too complex to be able to just look at and see what I have to do to get to that data.
If anyone can help me or at least point me to a good explanation of how that Object is organized and how to get to each sub object within it, that would be greatly appreciated.
EDIT: Never mind I figured it out.
You are almost there, really all you need to do is once you have your feed entry is access the link element inside. I like pretty URLs so I went with the alternate rather than the self entry in the atom feed.
$link = $entry->link[4]->href;
where $entry is the entry that you are setting from the feed.
The solution is:
$query = new Zend_Gdata_Query('http://www.blogger.com/feeds/MyID/posts/default');
$query->setParam('max-results', "1");
$feed = $gdClient->getFeed($query);
$newestPost = $feed->entry[0];
$body = $newestPost->getContent()->getText();
$body now contains the post contents of the latest post (or entry[0]) from the feed. This is just the contents of the body of the post, not the title or any other data or formatting.

Categories