file_get_contents() GET request not showing up on my webserver log - php

I've got a simple php script to ping some of my domains using file_get_contents(), however I have checked my logs and they are not recording any get requests.
I have
$result = file_get_contents($url);
echo $url. ' pinged ok\n';
where $url for each of the domains is just a simple string of the form http://mydomain.com/, echo verifies this. Manual requests made by myself are showing.
Why would the get requests not be showing in my logs?
Actually I've got it to register the hit when I send $result to the browser. I guess this means the webserver only records browser requests? Is there any way to mimic such in php?
ok tried curl php:
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "getcorporate.co.nr");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
same effect though - no hit registered in logs. So far it only registers when I feed the http response back from my script to the browser. Obviously this will only work for a single request and not a bunch as is the purpose of my script.
If something else is going wrong, what debugging output can I look at?
Edit: D'oh! See comments below accepted answer for explanation of my erroneous thinking.

If the request is actually being made, it would be in the logs.
Your example code could be failing silently.
What happens if you do:
<?PHP
if ($result = file_get_contents($url)){
echo "Success";
}else{
echo "Epic Fail!";
}
If that's failing, you'll want to turn on some error reporting or logging and try to figure out why.
Note: if you're in safe mode, or otherwise have fopen url wrappers disabled, file_get_contents() will not grab a remote page. This is the most likely reason things would be failing (assuming there's not a typo in the contents of $url).

Use curl instead?

That's odd. Maybe there is some caching afoot? Have you tried changing the URL dynamically ($url = $url."?timestamp=".time() for example)?

Related

Gateway Timeout 504 on multiple requests. Apache

I have an XML file localy. It contains data from marketplace.
It roughly looks like this:
<offer id="2113">
<picture>https://anotherserver.com/image1.jpg</picture>
<picture>https://anotherserver.com/image2.jpg</picture>
</offer>
<offer id="2117">
<picture>https://anotherserver.com/image3.jpg</picture>
<picture>https://anotherserver.com/image4.jpg</picture>
</offer>
...
What I want is to save those images in <picture> node localy.
There are about 9,000 offers and about 14,000 images.
When I iterate through them I see that images are being copied from that another server but at some point it gives 504 Gateway Timeout.
Thing is that sometimes error is given after 2,000 images sometimes way more or less.
I tried getting only one image 12,000 times from that server (i.e. only https://anotherserver.com/image3.jpg) but it still gave the same error.
As I've read, than another server is blocking my requests after some quantity.
I tried using PHP sleep(20) after every 100th image but it still gave me the same error (sleep(180) - same). When I tried local image but with full path it didn't gave any errors. Tried second server (non local) the same thing occured.
I use PHP copy() function to move image from that server.
I've just used file_get_contents() for testing purposes but got the same error.
I have
set_time_limit(300000);
ini_set('default_socket_timeout', 300000);
as well but no luck.
Is there any way to do this without chunking requests?
Does this error occur on some one image? Would be great to catch this error or just keep track of the response delay to send another request after some time if this can be done?
Is there any constant time in seconds that I have to wait in order to get those requests rollin'?
And pls give me non-curl answers if possible.
UPDATE
Curl and exec(wget) didn't work as well. They both gone to same error.
Can remote server be tweaked so it doesn't block me? (If it does).
p.s. if I do: echo "<img src = 'https://anotherserver.com/image1.jpg'" /> in loop for all 12,000 images, they show up just fine.
Since you're accessing content on a server you have no control over, only the server administrators know the blocking rules in place.
But you have a few options, as follows:
Run batches of 1000 or so, then sleep for a few hours.
Split the request up between computers that are requesting the information.
Maybe even something as simple as changing the requesting user agent info every 1000 or so images would be good enough to bypass the blocking mechanism.
Or some combination of all of the above.
I would suggest you to try following
1. reuse previously opened connection using CURL
$imageURLs = array('https://anotherserver.com/image1.jpg', 'https://anotherserver.com/image2.jpg', ...);
$notDownloaded = array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
foreach ($imageURLs as $URL) {
$filepath = parse_url($URL, PHP_URL_PATH);
$fp = fopen(basename($filepath), "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_URL, $URL);
curl_exec($ch);
fclose($fp);
if (curl_getinfo($ch, CURLINFO_RESPONSE_CODE) == 504) {
$notDownloaded[] = $URL;
}
}
curl_close($ch);
// check to see if $notDownloaded is empty
If images are accessible via both https and http try to use http instead. (this will at least speed up the downloading)
Check response headers when 504 is returned as well as when you load url your browser. Make sure there are no X-RateLimit-* headers. BTW what is the response headers actually?

Calling file() on pastebin URL fails, but on local file or google.com it works

I'm working on a bit of PHP code that depends on a remote file which happens to be hosted on pastebin. The server I am working on has all the necessary functions enabled, as running it with FILE_URL set to http://google.com returns the expected results. I've also verified through php.ini for extra measure.
Everything should work, but it doesn't. Calling file() on a URL formed as such, http://pastebin.com/raw.php?i=<paste id here>, returns a 500 server error. Doing the same on the exact same file hosted locally or on google.com returns a reasonable result.
I have verified that the URL is set to the correct value and verified that the remote page is where I think that it is. I'm at a loss.
ini_set("allow_url_fopen", true);
// Prefer remote (up-to-date) file, fallback to local file
if( ini_get("allow_url_fopen") ){
$file = file( FILE_URL );
}
if(!isset( $file ) || !$file ) {
$file = file( LOCAL_FILE_PATH );
}
I wasn't able to test this, but you should use curl, try something like this:
<?php
$url = "http://pastebin.com/2ZdFcEKh";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
Pastebin appear to use a protection system that will automatically block IP addresses that issue requests that are "bot-like".
In the case of your example, you will get a 500 server error since the file() command never completes (since their protection system never closes the connection) and there is no timeout facility in your call. The script is probably considered "bot-like" since file() does not pass through all the standard HTTP headers a typical browser would.
To solve this problem, I would recommend investigating cURL and perhaps look at setting a browser user agent as a starting point to grant access to your script. I should also mention that it would be in your interests to investigate whether or not this is considered a breach of the Pastebin user agreement. While I cannot see any reference to using scripts in their FAQ (as of 2012/12/29), they have installed protection against scripts for a reason.

Using file_get_contents or curl to embed html from a source outside my domain

We are developing a web app that enables the end user to place a bit of code on their webpage, so that anyone who visits their page will see a little pop up button on the edge of the browser window. When clicked, it opens a small panel where they can enter a telephone number for the website owner to call them back.
I am running up against a security issue. I am attempting to use server side include to place the button on the client's website. However, because this included site is on a different domain than the client's website, it is not allowed.
I have tried these two methods that I got from online forums, neither of which worked for me.
Use the file_get_contents handler, like this;
$includeFile =
file_get_contents("http://bizzocall.com/subdomains/dev/httpdocs/slideouttesttopDATA.php");
echo $includeFile;
ERROR failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found.
Use curl, like this:
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$feed = 'http://bizzocall.com/subdomains/dev/httpdocs/slideouttesttopDATA.php';
$bizzobtn = curl($feed);
echo $bizzobtn;
This didn't work either.
ERROR The requested URL /subdomains/dev/httpdocs/slideouttesttopDATA.php was not found on this server.
This looks like its truncating the bizzocall.com/ off of the url. Perhaps If I knew how to write this chunk of code correctly,this would work.
Any help here would be welcome!
The 404 error is coming from the remote server, not your own. Try to reach that URL - You'll get the same 404.
When I visit http://bizzocall.com/subdomains/dev/httpdocs/slideouttesttopDATA.php, I get the following:
Therefore, the issue is that you're using the wrong URL. Your code is sound, however.
Your curl code is working fine, the problem is the URL isn't found.
Put it in your browser to see.
If you own bizzocall.com, you'll want to put slideouttesttopDATA.php in a web accessible area, then you'll be able to access it with curl.
Visit your link a browser and you'll see that the resource you're trying to request clearly doesn't exist. More info on the HTTP 404 status code found here: http://en.wikipedia.org/wiki/HTTP_404.
Perhaps it should be http://dev.bizzocall.com/slideouttesttopDATA.php as your URL does not work when pasted into a browser...

How to collect HTML source response from a remote server?

From within the HTML code in one of my server pages I need to address a search of a specific item on a database placed in another remote server that I don’t own myself.
Example of the search type that performs my request: http://www.remoteserver.com/items/search.php?search_size=XXL
The remote server provides to me - as client - the response displaying a page with several items that match my search criteria.
I don’t want to have this page displayed. What I want is to collect into a string (or local file) the full contents of the remote server HTML response (the code we have access when we click on ‘View Source’ in my IE browser client).
If I collect that data (it could easily reach reach 50000 bytes) I can then filter the one in which I am interested (substrings) and assemble a new request to the remote server for only one of the specific items in the response provided.
Is there any way through which I can get HTML from the response provided by the remote server with Javascript or PHP, and also avoid the display of the response in the browser itself?
I hope I have not confused your minds …
Thanks for any help you may provide.
As #mario mentioned, there are several different ways to do it.
Using file_get_contents():
$txt = file_get_contents('http://www.example.com/');
echo $txt;
Using php's curl functions:
$url = 'http://www.mysite.com';
$ch = curl_init($url);
// Tell curl_exec to return the text instead of sending it to STDOUT
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// Don't include return header in output
curl_setopt($ch, CURLOPT_HEADER, 0);
$txt = curl_exec($ch);
curl_close($ch);
echo $txt;
curl is probably the most robust option because you have options for more control over the exact request parameters and possibilities for error handling when things don't go as planned

curl sending GET instead of POST

Actually, it's gotten so messy that I'm not even sure curl is the culprit. So, here's the php:
$creds = array(
'pw' => "xxxx",
'login' => "user"
);
$login_url = "https://www.example.net/login-form"; //action value in real form.
$loginpage = curl_init();
curl_setopt($loginpage, CURLOPT_HEADER, 1);
curl_setopt($loginpage, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($loginpage, CURLOPT_URL, $login_url);
curl_setopt($loginpage, CURLOPT_POST, 1);
curl_setopt($loginpage, CURLOPT_POSTFIELDS, $creds);
$response = curl_exec($loginpage);
echo $response;
I get the headers (which match the headers of a normal, successful request), followed by the login page (I'm guessing curl captured this due to a redirect) which has an error to the effect of "Bad contact type".
I thought the problem was that the request had the host set to the requesting server, not the remote server, but then I noticed (in Firebug), that the request is sent as GET, not POST.
If I copy the login site's form, strip it down to just the form elements with values, and put the full URL for the action, it works just great. So I would think this isn't a security issue where the login request has to originate on the same server, etc. (I even get rid of the empty hidden values and all of the JS which set some of the other cookies).
Then again, I get confused pretty quickly.
Any ideas why it's showing up as GET, or why it's not working, for that matter?
When troubleshooting the entire class of PHP-cURL-related problems, you simply have to turn on CURLOPT_VERBOSE and give CURLOPT_STDERR a file handle.
tail -f your file, compare the headers and response to the ones you see in Firebug, and the problem should become clear.
The request is made from the server, and will not show up in Firebug. (You probably confused it with another request by your browser). Use wireshark to find out what really happens. You are not setting CURLOPT_FOLLOWLOCATION; redirects should not be followed.
Summarizing: Guess less, post more. Link to a pcap dump, and we will be able to tell exactly what you're doing wrong; or post the exact output of the php script, and we might.
The shown code does a multipart formpost (since you pass a hash array to the POSTFIELDS option), which probably is not what the target server expects.
try throwing in a print_r(curl_getinfo($loginpage)) at the end, see what the header data it sent back as.
also, if your trying to fake that your logging in from their site, your going to want to make sure your sending the correct referrer with your post, so that they "think" you were on the website when you sent it.

Categories