PHP mirror a webpage

PHP mirror a webpage - php

I am trying to create a mirror of a weather widget which I use for a website. Presently, it is used on an HTTPS page, but widget server does not support that (and IE throws a tantrum with dialogs because the widget is not HTTPS)
To solve this, I would like to do is mirror the page in HTTPS to silence the security warnings. I would normally use file_get_contents() for this, however the page contains images which makes it a little more complicated.
**Also as a side note, there isn't any ads on my website or theirs, so there is no revenue stealing

Use CURL to grab a page's content (images and all). You can put this in a file, then use that URL in place of where you'd use the widget's URL:
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
See the docs: http://www.php.net/manual/en/function.curl-exec.php

Related

php cURL request to instsagram API creates page lag

I am using cURL toaccess instagrams API on a webpage I am building. THe functionality works great, however, page load is sacrificed. For instance, consider this DOM structure:
Header
Article
Instagram Photos (retrieved via cURL)
Footer
When loading the page, the footer will not load until the instagram hotos have been fully loaded with cURL. Below is the cURL function that is being called:
function fetchData($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$result = fetchData("https://api.instagram.com/v1/media/search?lat={$lat}&lng={$lng}&distance={$distance}&access_token={$accessToken}");
$result = json_decode($result);
So after this function is run, then the rest of the DOM is displayed. If I move the function call below the footer, it does not work.
Is there anything I can do to load the entire webpage and have the cURL request setn on top of the loading site (not cause a lag or holdup)?
UPDATE: Is the best solution to load it after the footer, and then append it to another area with js?

You can cache the result json into a file that is saved locally. You can make a cronjob that is called ever minute and update the locally cache file. This makes your page loading much more faster. The downside is that your cache is updating even when you don't have visitors and you have a delay of a minute in data from instagram.

Need to scrape contents of website that requires an "i agree" cookie to be set

From everything I've read, it seems that this is an impossible. But here is my scenario:
I need to scrape a table's content containing for sale housing information. The page is not password protected or anything, but you first have to click an "I Agree" link on the previous page so that a cookie gets set saying you agree that the content may not be 100% accurate. You are only then shown the data. Is there any way at all to accomplish this using php/jquery/javascript? I know you cannot create an iframe because of the fact that it is cross-domain. I also do not have access to this other website.
Thanks for any answers, as I'm not really expecting anything positive. :) And many thanks if you can tell me how to do this. :D

Use server side script (PHP using cURL) to crawl the website and return the information you need. Make sure you set the appropriate HTTP header with your request that represents the "I agree" cookie.
Sample:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/');
curl_setopt($ch, CURLOPT_COOKIE, 'I_Agree=1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$responseBody = curl_exec($ch);
curl_close($ch);
// Read the information you need from $responseBody and return it as response body
?>
Now you can access the information from your website by calling your server side script above. For details about how to use cURL take a look at the documentation.

CURL can store or recall cookies from a file depending on the options you set. Here is the "cookiejar" example:
http://curl.haxx.se/libcurl/php/examples/cookiejar.html
Check out the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options

live change any site visualization properties

Is there a way to load a site (or point to a web page) through a custom page able to grab the original site's element and change their visualization properties?
For example: if there is an on-line site with red-borded paragraphs, it is possible create a page able to point at this site and turning its paragraph visualization border to blue?
UPDATE:
Very very important, I absolutely need to be able to navigate the target site. I'm looking for a way to create a "visualization layer" which has to be transparent for the final user.
UPDATE 2:
I have tried creating a php page and using cURL with this code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'www.google.com');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
$output = curl_exec($ch);
curl_close($ch);
print( $output );
The page is loaded but missed some elements and the images. I is also not possible to navigate / search through it.
UPDATE 3:
I have found an extention for both FireFox and Chrome which is really close to my aim: "Stylish"
This add on allows to live change any site css proprerty and save it in order to reload them every time you'll visit the page.
Now my question is: How can I do the same creating a dedicated page to load and change visualization of a specific site?
FINAL EDIT:
In order to continue this question with a more relevant arguments I decided to ask a new one: create a php proxy page

Use php CURL functions to grab website and then replace css of page with your own, or just embed css rules into file with help of phpQuery library.

PHP GET Request not working

When I type http://rest.example.com/account/get-balance/27e3xxx/7vvU4c95trfxxxx in browser and hit enter, I get following XML response.
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<accountBalance>
<creditLimit>0.0</creditLimit>
<quotaEnabled>true</quotaEnabled>
<value>2.0</value>
</accountBalance>
But When I am trying the same URL inside PHP it is sending the response Page not found(Oops! That page doesn't exist.). Here are the few ways I try...
Using SimpleXML
$content = simplexml_load_file($this->request_uri);
Using File methods
$content = file_get_contents($this->request_uri);
Using CURL
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $this->request_uri);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
$content = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
$this->request_uri = same URL I paste in browser. Where I am wrong? Please help me on this. Thank you.

There is more information that is being transfered from browser to the server than only the URI:
http://rest.example.com/account/get-balance/27e3xxx/7vvU4c95trfxxxx
By only passing the URI but not passing the other information that gets passed from the browser, you can get different results normally. In your case it is clear that you get different results.
Find out about the other information that get's passed to the server when you request that URI with your browser. This other information is called request-line, request-headers and the body. In a GET request, the request body is normally empty so you only need to concentrate on the request-line and headers, see:
5 Request - part of Hypertext Transfer Protocol -- HTTP/1.1 RFC 2616 Fielding, et al.
Contact the technical documentation of your browser if it has so called network tools that are able to display the whole request information for debugging purposes (e.g. Chromium has this, for Firefox there is the Firebug extension that has it).
You can then easily mimic the request with PHP's HTTP wrapper context optionsDocs or the Curl extension and it's endless array of optionsDocs to achieve what you want.

Equivalent is_file() function for URLs?

What is the best way to check if a given url points to a valid file (i.e. not return a 404/301/etc.)? I've got a script that will load certain .js files on a page, but I need a way to verify each URL it receives points to a valid file.
I'm still poking around the PHP manual to see which file functions (if any) will actually work with remote URLs. I'll edit my post as I find more details, but if anyone has already been down this path feel free to chime in.

The file_get_contents is a bit overshooting the purpose as it is enough to have the HTTP header to make the decision, so you'll need to use curl to do so:
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>

one such way would be to request the url and get a response with a status code of 200 back, aside from that, there's really no good way because the server has the option of handling the request however it likes (including giving you other status codes for files that exist, but you don't have access to for a number of reasons).

If your server doesn't have fopen wrappers enabled (any server with decent security won't), then you'll have to use the CURL functions.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP mirror a webpage - php

Related

php cURL request to instsagram API creates page lag

Need to scrape contents of website that requires an "i agree" cookie to be set

live change any site visualization properties

PHP GET Request not working

Equivalent is_file() function for URLs?

Categories

Resources