for one project, i need to get the facebook source page (html one) via a php application.
i try lot of method like curl, file_get_content, change my ini_set, etc.... but facebook never let me get the html result file.
Does anyone can help ?
for example this page :
ini_set('user_agent', $_SERVER['HTTP_USER_AGENT']);
$data = file_get_contents("http://apps.facebook.com/is_cool/?cafe_action=album&view=scroll",0);
Print strip_tags($data,"");
Thanks a lot.
Damien
Comment 1 :
- I need to create 2 application. I want to parse the html code to get some information from one to the other. I don't want to duplicate or take the facebook code. I just want to make a "view source" (like IE or firefox) and put it on a file, without ask my users. When my user is logged in my first application, i just want to is is credential to get the other content.
The reason you're having problems is that the majority of the facebook homepage content is loaded via AJAX. The data is not hardcoded into what your browser renders.
You should think of a different way to accomplish your goals. If you tell us a little more about what you're trying to do, we can probably help you find an alternate method.
Related
I'm trying to pull a piece of data from the website www.coinmarketcap.com
specifically, the market cap number up the top.
I've been trying to figure this out the past hour or so and have read MANY different ways people use these web scrapers but have not been successful at all. Could someone shed some light?
There are multiple ways, but the easiest is just take their url:
https://files.coinmarketcap.com/generated/stats/global.json
Please note: They might not like this. Maybe they dont want external parties to use their scripts. So also buidl a check wether the file still exists and doesnt give a 403 back.
How did I find this:
When the page loads,the header with the information loads after the document ready, so it can not have been made by the server and has to be AJAX.
Now we know that it is AJAX, we want to know which file. You do this by opening your browsers console. All browsers have a network tab, showing all resources being loaded. When you filter by XHR you see all AJAX request. Then you try to find the right one.
So what I am trying to do is this:
On my server users can put there YouTube channel name. My php file will then parse the channel and output HTML code with results. What I am looking to do is for the users to be able to put a code on there website that till call on my website lets say youtubevideos.com/videos.php?channel=channelname my code will take that name and output the videos back to there site. much like Google ads I guess.
Any idea how that is done, other than an iframe, I figured that will be my last resort.
I think what I'm looking for if for them to put a JavaScript on there site that will render as the HTML code I'm pushing from my php file.
Thank you!
The receiver code which is on the server you target need to set a header like that :
"Access-Control-Allow-Origin:*"
So, if you provide a service which need to exchange with your server & your code, is it possible. If you can't edit the targeted code & the header is not setted, it'll be impossible
There would be two parts of this solution.
In the videos.php file on your server, you would implement the logic to scrape the data from the original site and format it in the way you want to show on the final website.
For the end user, you would give a code similar to this that they would have to paste in their php pages to display the content from your site.
$your_website_url="http://youtubevideos.com/videos.php?channel=channelname";
//Don't forget the http:// at the start.
echo file_get_contents($your_website_url);
If file_get_contents() gives a security error, you can use curl.
I hope that helps.
I'm building a website and am looking for a way to implement a certain feature that Facebook has. The feature that am looking for is the link inspector. I am not sure that is what it is called, or what its called for that matter. It's best I give you an example so you know exactly what I am looking for.
When you post a link on Facebook, for example a link to a youtube video (or any other website for that matter), Facebook automatically inspects the page that it leads you and imports information like page title, favicon, and some other images, and then adds them to your post as a way of giving (what i think is) a brief preview of the page to anyone reading that post.
I already have a feature that allows users to share a link (or URLs). What I want is to do something useful with the url, to display something other than just a plain link to a webpage, to give someone viewing a shared link (in the form if a post) some useful insight into the page that the url leads to.
What I'm looking for is a script, or tutorial, or at the very least someone to point me in the right direction, so that it can help me accomplish this (using PHP preferably).
I've tried googling it but I don't know exactly what such a feature would be called and google isn't helpful when you don't exactly know what you're looking for.
I figure someone out there, in this vast knowledge basket called stackoverflow, can help me with this. Can anyone help me?
You would first scan the page for URLs using regex, then you would parse the pages those links reference with a php DOMDocument. You could use the parsed document to obtain any information you need from the webpage.
DOMDocument:
http://php.net/manual/en/class.domdocument.php
DOMDocument->load (loads a file, aka a webpage):
http://php.net/manual/en/domdocument.load.php
the link goes through http://www.facebook.com/l.php
You pass a URL to this and facebook filters it.
I've tried a bunch of techniques to crawl this url (see below), and for some reason the title comes back incorrect. If I look at the source of the page with firebug I can see the correct title tag, however, if I view the page source it's different.
Using several php techniques I get the same result. Digg is able to crawl the page and parse the correct title.
Here's the link: http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
The correct title is "How to Make Your iPhone (or Other iOS Device) More Like Android"
The parsed title is "Lifehacker, tips and downloads for getting things done"
Is this normal? How are they doing this? Is there a way to get the correct title?
That's because when you request it using PHP (without any JS support) you're getting the main page of lifehacker - which is lifehacker.com.
Lifehacker switched their CMS recently so that all requests go to an initial page and then everything after the hashbang is read by a JS script in the main page to figure out which page needs to be served. You need to modify your program to take this into account
EDIT
Have a gander at these links
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch
Found the answer:
http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
becomes:
http://lifehacker.com/?_escaped_fragment_=5772420/how-to-make-ios-more-like-android
hi im using ajax to extract all the pages into the main page but am not being able to control the refresh , if somebody refreshes the page returns back to the main page can anybody give me any solutions , i would really appreciate the help...
you could add anchor (#something) to your URL and change it to something you can decode to some particular page state on every ajax event.
then in body.onload check the anchor and decode it to some state.
back button (at least in firefox) will be working alright too. if you want back button to work in ie6, you should add some iframe magic.
check various javascript libraries designed to support back button or history in ajax environment - this is probably what you really need. for example, jQuery history plugin
You can rewrite the current url so it gives pointers to where the user was - see Facebook for examples of this.
I always store the 'current' state in PHP session.
So, user can refresh at any time and page will still be the same.
if somebody refreshes the page returns back to the main page can anybody give me any solutions
This is a feature, not a bug in the browser. You need to change the URL for different pages. Nothing is worse then websites that use any kind of magic either on the client side or the server side which causes a bunch of completely different pages to use the same URL. Why? How the heck am I gonna link to a specific page? What if I like something and want to copy & paste the URL into an IM window?
In other words, consider the use cases. What constitutes a "page"? For example, if you have a website for stock quotes--should each stock have a unique URL? Yes. Should you have a unique URL for every variation you can make to the graph (i.e. logarithmic vs linear, etc)? Depends--if you dont, at least provide a "share this" like google maps does so you can have some kind of URL that you can share.
That all said, I agree with the suggestion to mess with the #anchor and parse it out. Probably the most elegant solution.