Load entire html file to parse with php? - php

i try to parse a website's html file with php fopen(). Thats works so far very well but the problem is, that there are serveral posts on the site that aren't shown in the html file, because u have to scroll far down until the posts load.
As an example, i try to count the total amount of comments in my own facebook page. (Just an example, if it's shown somewhere on facebook, that doesn't help me)
How can I make the html file load completely?
Thank you

you cannot, directly. What you are doing is called scraping. You have to inspect the queries made by the browser in your developer tools when viewing that page yourself, and reproduce those queries in php through fopen() or any other means (CUrl, etc...)

Related

How to send large strings to another page, while also maintaining fast refreshing?

I am trying to setup a website with a preview of what you are typing. I have to have two pages, one of them has an HTML editor, and the second is in an iframe in the same page to display the output.
I need it to have instant updates when you type something into the editor so it is reflected on to the other page. I am using an iframe because I can't have the output on this page. This is because the page already has CSS which overrides what CSS you put into the editor.
I have tried using cookies but they have a character limit which is no good for what I am doing. I am looking for a way to send large strings from one page to another with no character limits and with speedy responses.
Any help would be amazing. Thank you.

Load external site with PHP without breaking relative links in that site

I have a php page where I'm trying to load and then echo and external page, (which is sitting in the same server but in complete different path/domain, if that matters).
I've tried using both file_get_contents() and curl. They both correctly load the html of the target page, the problem is that it's not displaying correctly because that target page has relative links to several files (images, css, javascript).
Is there any way I can accomplish this with PHP? If not, what would be the next best way? The target site must look like it's being loaded from the initial page (URL-wise), I don't want to do a redirect.
So, the browser would show http://example.com/initial-page.php even though its contents come from http://example2.com/target-page.php
EDIT:
This is something that could easily be done with an iframe but I want to avoid that too for several reasons, one of them is because with and iframe it breaks the responsiveness of the target site. I can't change the code of the target site to fix that either.
In the end, the solution was a combination of what I was trying to do (using curl) and what WebRookie suggested using the base html tag in the page being loaded via curl.
In my particular case, I pass the base URL as a parameter in curl and I echo it in the loaded page, allowing me to load that same page from different websites (which was another reason why I wanted to do this).

Pass php results to another website.

So what I am trying to do is this:
On my server users can put there YouTube channel name. My php file will then parse the channel and output HTML code with results. What I am looking to do is for the users to be able to put a code on there website that till call on my website lets say youtubevideos.com/videos.php?channel=channelname my code will take that name and output the videos back to there site. much like Google ads I guess.
Any idea how that is done, other than an iframe, I figured that will be my last resort.
I think what I'm looking for if for them to put a JavaScript on there site that will render as the HTML code I'm pushing from my php file.
Thank you!
The receiver code which is on the server you target need to set a header like that :
"Access-Control-Allow-Origin:*"
So, if you provide a service which need to exchange with your server & your code, is it possible. If you can't edit the targeted code & the header is not setted, it'll be impossible
There would be two parts of this solution.
In the videos.php file on your server, you would implement the logic to scrape the data from the original site and format it in the way you want to show on the final website.
For the end user, you would give a code similar to this that they would have to paste in their php pages to display the content from your site.
$your_website_url="http://youtubevideos.com/videos.php?channel=channelname";
//Don't forget the http:// at the start.
echo file_get_contents($your_website_url);
If file_get_contents() gives a security error, you can use curl.
I hope that helps.

Crawl Website using PHP

I've tried a bunch of techniques to crawl this url (see below), and for some reason the title comes back incorrect. If I look at the source of the page with firebug I can see the correct title tag, however, if I view the page source it's different.
Using several php techniques I get the same result. Digg is able to crawl the page and parse the correct title.
Here's the link: http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
The correct title is "How to Make Your iPhone (or Other iOS Device) More Like Android"
The parsed title is "Lifehacker, tips and downloads for getting things done"
Is this normal? How are they doing this? Is there a way to get the correct title?
That's because when you request it using PHP (without any JS support) you're getting the main page of lifehacker - which is lifehacker.com.
Lifehacker switched their CMS recently so that all requests go to an initial page and then everything after the hashbang is read by a JS script in the main page to figure out which page needs to be served. You need to modify your program to take this into account
EDIT
Have a gander at these links
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch
Found the answer:
http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
becomes:
http://lifehacker.com/?_escaped_fragment_=5772420/how-to-make-ios-more-like-android

get a facebooksource page

for one project, i need to get the facebook source page (html one) via a php application.
i try lot of method like curl, file_get_content, change my ini_set, etc.... but facebook never let me get the html result file.
Does anyone can help ?
for example this page :
ini_set('user_agent', $_SERVER['HTTP_USER_AGENT']);
$data = file_get_contents("http://apps.facebook.com/is_cool/?cafe_action=album&view=scroll",0);
Print strip_tags($data,"");
Thanks a lot.
Damien
Comment 1 :
- I need to create 2 application. I want to parse the html code to get some information from one to the other. I don't want to duplicate or take the facebook code. I just want to make a "view source" (like IE or firefox) and put it on a file, without ask my users. When my user is logged in my first application, i just want to is is credential to get the other content.
The reason you're having problems is that the majority of the facebook homepage content is loaded via AJAX. The data is not hardcoded into what your browser renders.
You should think of a different way to accomplish your goals. If you tell us a little more about what you're trying to do, we can probably help you find an alternate method.

Categories