I'm trying to get the contents from another file with file_get_contents (don't ask why).
I have two files: test1.php and test2.php. test1.php returns a string, bases on the user that is logged in.
test2.php tries to get the contents of test1.php and is being executed by the browser, thus getting the cookies.
To send the cookies with file_get_contents, I create a streaming context:
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"))`;
I'm retrieving the contents with:
$contents = file_get_contents("http://www.example.com/test1.php", false, $opts);
But now I get the error:
Warning: file_get_contents(http://www.example.com/test1.php) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
Does somebody knows what I'm doing wrong here?
edit:
forgot to mention: Without the streaming_context, the page just loads. But without the cookies I don't get the info I need.
First, this is probably just a typo in your question, but the third arguments to file_get_contents() needs to be your streaming context, NOT the array of options. I ran a quick test with something like this, and everything worked as expected
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"));
$context = stream_context_create($opts);
$contents = file_get_contents('http://example.com/test1.txt', false, $context);
echo $contents;
The error indicates the server is returning a 404. Try fetching the URL from the machine PHP is running on and not from your workstation/desktop/laptop. It may be that your web server is having trouble reaching the site, your local machine has a cached copy, or some other network screwiness.
Be sure you repeat your exact request when running this test, including the cookie you're sending (command line curl is good for this). It's entirely possible that the page in question may load fine in a browser without the cookie, but when you send the cookie the site actually is returning a 404.
Make sure that $_SERVER['HTTP_COOKIE'] has the raw cookie you think it does.
If you're screen scraping, download Firefox and a copy of the LiveHTTPHeaders extension. Perform all the necessary steps to reach whatever page it is you want in Firefox. Then, using the output from LiveHTTPHeaders, recreate the exact same request requence. Include every header, not just the cookies.
Finally, PHP Curl exists for a reason. If at all possible, (I'm not asking!) use it instead. :)
Just to share this information.
When using session_start(), the session file is lock by PHP. Thus the actual script is the only script that can access the session file. If you try to access it via fsockopen() or file_get_contents() you can wait a long time since you try to open a file that has been locked.
One way to solve this problem is to use the session_write_close() to unlock the file and relock it after with session_start().
Example:
<?php
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"));
$context = stream_context_create($opts);
session_write_close(); // unlock the file
$contents = file_get_contents('http://120.0.0.1/controler.php?c=test_session', false, $context);
session_start(); // Lock the file
echo $contents;
?>
Since file_get_contents() is a blocking function, both script won't be in concurrency while trying to modify the session file.
But i'm sure this is not the best manner to manipulate session with an extend connection.
Btw: it's faster than cURL and fsockopen()
Let me know if you find something better.
Just out of curiosity, are you attempting file_get_contents on a page that has a space in it? I remember trying to use fgc on a URL that had a space in the name and while my web browser parsed it just fine, fgc didn't. I ended up having to use a str_replace to replace ' ' with '%20'.
I would think that this should have been relatively easy to spot that though as it would report only half of the filename. Also, I noticed in one of these posts, someone used \r\n while defining the headers. Keep in mind that PHP doesn't like these to be in single quotes, but they work fine in double.
Make sure that file1.php exists on the server. Try opening it in your own browser to make sure!
Related
I'm trying to POSTing some data (a JSON string) from a php script to a java server (all written by myself) and getting the response back.
I tried the following code:
$url="http://localhost:8000/hashmap";
$opts = array('http' => array('method' => 'POST', 'content' => $JSONDATA,'header'=>"Content-Type: application/x-www-form-urlencoded"));
$st = stream_context_create($opts);
echo file_get_contents($url, false,$st);
Now, this code actually works (I get back as result the right answer), but file_get_contents hangs everytime 20 seconds while being executed (I printed the time before and after the instruction). The operations performed by the server are executed in a small amount of time, and I'm sure it's not normal to wait all this time to get the response.
Am I missing something?
Badly mis-configured server maybe that doesn't send the right content-size and using HTTP/1.1.
Either fix the server or request the data as HTTP/1.0
Try adding Connection: close and Content-Length: strlen($JSONDATA) headers to the $opts.
Also, if you want to avoid using extensions, have a look at this class I wrote some time ago to perform HTTP requests using PHP core only. It works on PHP4 (which is why I wrote it) and PHP5, and the only extension it ever requires is OpenSSL, and you only need that if you want to do an HTTPS request. Documented(ish) in comments at the top.
Supports all sorts of stuff - GET, POST, PUT and more, including file uploads, cookies, automatic redirect handling. I have used it quite a lot on a platform I work with regularly that is stuck with PHP/4.3.10 and it works beautifully... Even if I do say so myself...
I have been made aware of the Accept-Range header.
I have a URL that I am calling that always returns a 2mb file. I don't need this much and only need the last section 20-50k.
I am not sure how to go about using it? Would I need to use cURL? I am currently using file_get_contents().
Would someone be able to provide me with an example / tutorial?
Thanks.
EDIT: If this isn't possible then what is post on about? Here ...
EDIT: Ulrika! I'm not insane.
This is possible using the Range header, provided the server supports it. See the HTTP 1.1 spec. You would want to send a header in the following format in your request:
Range: bytes=-50000
This would give you the last 50,000 bytes. Adjust to whatever you need.
You can specify this header in file_get_contents using a context. For example:
// Create a stream
$opts = array(
'http'=>array(
'method' => "GET",
'header' => "Range: bytes=-50000\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
If you were to file_get_contents() and dump that to a passthrough 'cache' file on disk, then you could use the unix/linux tail -c to only grab back the last 20kb or so. This doesn't mitigate the actual transfer, but gets that 20kb into the application.
This is indeed possible - see this question for an example of the HTTP headers sent and received
you can't do that. You're going to have to load the entire file (which is sent in its entirety, sequentially, by the source server), and just discard most of it.
What you're asking is like "I'm tuning to this radio station on my car stereo and I only want to hear the last 5 minutes of the show, without having to wait for the rest to complete or change channels".
I have a page called send.email.php which sends an email - pretty simple stuff - I pass an order id, it creates job request and sends it out. This works fine when used in the context I developed it (Use javascript to make an AJAX call to the URL and pass the order_id as a query parameter)
I am now trying to reuse the exact same page in another application however I am calling it using php file_get_contents($base_url.'admin/send.email.php?order_id='.$order_id). When I call the page this way, the $_SESSION array is empty isempty() = 1.
Is this because I am initiating a new session using file_get_contents and the values I stored in the $_SESSION on login are not available to me within there?
--> Thanks for the feedback. It makes sense that the new call doesn't have access to the existing session...
New problem though:
I now get: failed to open stream: HTTP request failed! When trying to execute:
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"));
$context = stream_context_create($opts);
$contents = file_get_contents($base_url.'admin/send.sms.php?order_id='.order_id, false, $context);
YET, the URL works fine if I call it as: (It just doesn't let me access session)
$result file_get_contents($base_url.'admin/send.sms.php?order_id='.$order_id);
file_get_contents() shouldn't be used anywhere you need authentication/session information transmitted. It's not going to send any cookies, so the user's authentication information will not be included by default.
You can kind of hack around it by including the session identifier (e.g. 'PHPSESSID' by default) as a query parameter in the URL, and have the other script check for that. But transmitting session identifiers in the URL is horribly bad practice, even if it's just to the same server.
$contents = file_get_contents("http://.... /send_sms.php?order_id=$order_id&" . session_name() . '=' . session_id());
To do this properly, use CURL and build a full HTTP request, including the cookie information of the parent page.
you'd have to include the file or call the respective functions from your send.sms.php script. instead you call it like a webservice (which it isn't)
I've got a simple php script to ping some of my domains using file_get_contents(), however I have checked my logs and they are not recording any get requests.
I have
$result = file_get_contents($url);
echo $url. ' pinged ok\n';
where $url for each of the domains is just a simple string of the form http://mydomain.com/, echo verifies this. Manual requests made by myself are showing.
Why would the get requests not be showing in my logs?
Actually I've got it to register the hit when I send $result to the browser. I guess this means the webserver only records browser requests? Is there any way to mimic such in php?
ok tried curl php:
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "getcorporate.co.nr");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
same effect though - no hit registered in logs. So far it only registers when I feed the http response back from my script to the browser. Obviously this will only work for a single request and not a bunch as is the purpose of my script.
If something else is going wrong, what debugging output can I look at?
Edit: D'oh! See comments below accepted answer for explanation of my erroneous thinking.
If the request is actually being made, it would be in the logs.
Your example code could be failing silently.
What happens if you do:
<?PHP
if ($result = file_get_contents($url)){
echo "Success";
}else{
echo "Epic Fail!";
}
If that's failing, you'll want to turn on some error reporting or logging and try to figure out why.
Note: if you're in safe mode, or otherwise have fopen url wrappers disabled, file_get_contents() will not grab a remote page. This is the most likely reason things would be failing (assuming there's not a typo in the contents of $url).
Use curl instead?
That's odd. Maybe there is some caching afoot? Have you tried changing the URL dynamically ($url = $url."?timestamp=".time() for example)?
Given a list of urls, I would like to check that each url:
Returns a 200 OK status code
Returns a response within X amount of time
The end goal is a system that is capable of flagging urls as potentially broken so that an administrator can review them.
The script will be written in PHP and will most likely run on a daily basis via cron.
The script will be processing approximately 1000 urls at a go.
Question has two parts:
Are there any bigtime gotchas with an operation like this, what issues have you run into?
What is the best method for checking the status of a url in PHP considering both accuracy and performance?
Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.
As a starting point you could use some function like this:
function is_available($url, $timeout = 30) {
$ch = curl_init(); // get cURL handle
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK
curl_close($ch); // close handle
return $retval;
}
However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.
Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.
Look into cURL. There's a library for PHP.
There's also an executable version of cURL so you could even write the script in bash.
I actually wrote something in PHP that does this over a database of 5k+ URLs. I used the PEAR class HTTP_Request, which has a method called getResponseCode(). I just iterate over the URLs, passing them to getResponseCode and evaluate the response.
However, it doesn't work for FTP addresses, URLs that don't begin with http or https (unconfirmed, but I believe it's the case), and sites with invalid security certificates (a 0 is not found). Also, a 0 is returned for server-not-found (there's no status code for that).
And it's probably easier than cURL as you include a few files and use a single function to get an integer code back.
fopen() supports http URI.
If you need more flexibility (such as timeout), look into the cURL extension.
Seems like it might be a job for curl.
If you're not stuck on PHP Perl's LWP might be an answer too.
You should also be aware of URLs returning 301 or 302 HTTP responses which redirect to another page. Generally this doesn't mean the link is invalid. For example, http://amazon.com returns 301 and redirects to http://www.amazon.com/.
Just returning a 200 response is not enough; many valid links will continue to return "200" after they change into porn / gambling portals when the former owner fails to renew.
Domain squatters typically ensure that every URL in their domains returns 200.
One potential problem you will undoubtably run into is when the box this script is running on looses access to the Internet... you'll get 1000 false positives.
It would probably be better for your script to keep some type of history and only report a failure after 5 days of failure.
Also, the script should be self-checking in some way (like checking a known good web site [google?]) before continuing with the standard checks.
You only need a bash script to do this. Please check my answer on a similar post here. It is a one-liner that reuses HTTP connections to dramatically improve speed, retries n times for temporary errors and follows redirects.