Is it possible to use file_get_contents() to download a portion of data. For example, if I'm downloading a text file that is 2MB, and I only want the first 5 bytes, is this possible?
Sure. The additional arguments allow you to specify a portion of the file. See example #3 on the manual page:
<?php
// Read 14 characters starting from the 21st character
$section = file_get_contents('./people.txt', NULL, NULL, 20, 14);
var_dump($section);
?>
Here, the last two arguments limit the amount of data returned to just the portion of interest.
Note: The offset argument is a little unpredictable with remote files, as stated also on the manual page:
Seeking (offset) is not supported with remote files. Attempting to seek on non-local files may work with small offsets, but this is unpredictable because it works on the buffered stream.
function ranger($url, $bytes){
$headers = array(
"Range: bytes=0-".$bytes
);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($curl);
}
$url = "http://example.com/textfile.txt";
$raw = ranger($url, 5);
echo $raw;
Keep in mind that Range header must be supported by server. With fgc I think it is impossibru, even if it is, you should use cURL.
Related
I've been trying to pull Survey Data for a client from their Survey Monkey account, it seems that the more data their is the more likely illegal characters are introduced in to the resulting JSON string.
Below is a sample of what is returned on a bad response, every response is different and even shorter requests some times fail leaving me at a miss.
{
"survey_id": "REDACTED",
"title": "REDACTED",
"date_modified": "2014-XX-18 17:59:00",
"num_responses": 0,
"date_created": "�2014-01-21 10:29:00",
"question_count": 102
}
I can't fathom as to why this is happening, the more parameters in the fields option there are, the more illegal characters are introduced. It isn't just illegal invalid characters, some times random letters are thrown in as well which prevents me from handling the data correctly.
I am using Laravel 4 with the third party Survey Monkey library by oori
https://github.com/oori/php-surveymonkey
Any help would be appreciated in tracking down the issue, the deadline is pretty tight and if this can't be resolved I'll have to resort to asking the client to manually import CSV files which isn't ideal and introduces possible user error.
On a side note, I don't see this issue cropping up when using the same parameters on the Survey Monkey console.
O/S: Windows 8.1 with WAMP Server
Code used to execute the request
$Surveys = SurveyMonkey::getSurveyList(array
(
'page_size' => 1000,
'fields' => array
(
'title', 'question_count', 'num_responses', 'date_created', 'date_modified'
)
));
The SurveyMonkey facade is a custom package used to integrate the original Survey Monkey library located here:
https://github.com/oori/php-surveymonkey/blob/master/SurveyMonkey.class.php
Raw PHP cURL request
$header = array('Content-Type: application/json','Authorization: Bearer REDACTED');
$post = json_encode(array(
'fields' => array(
'title', 'question_count', 'num_responses', 'date_created', 'date_modified'
)
));
$post = json_encode($post);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://api.surveymonkey.net/v2/surveys/get_survey_list?api_key=REDACTED");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8');
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
$result = curl_exec($ch);
The above request returns the same troublesome characters, nothing else was used to get the response.
Using the following code
echo "\n".mb_detect_encoding($result, 'UTF-8', true);
This code shows the charset for the response, when successful and no illegal characters are present (there are still random characters in the wrong places) it returns that it is in fact UTF-8, when illegal characters are present false is returned so nothing is outputted. More often than not, false is returned.
Maybe I'm grossly oversimplifying the whole thing and apologies if so, but I have had these funny little chars pop in to results, too.
They were leading and trailing whitespace.
Can you trim data on retrieve and see if it still happens?
This question has two parts:
Part I - restriction?
I'm able to store data to my DB with this:
www.mysite.com/myscript.php?testdata=abc123
This works for a short string (eg 'abc123') and the page echos what was written to the DB; however, if the [testdata=] string is longer than 512 chars and i check the database, it shows a row has been added but it's blank and also my echo statement in the script doesn't display the input string.
N.B. I'm on a shared server and have emailed my host to see if it's a restriction.
Part II - best practice?
If i can get past the above hurdle, I want to use a string that's ~15k chars long created in a desktop app that concatenates the [testdata=] string from various parameters; what's the best way to send a long string in PHP POST?
Thanks in advance for your help, i'm not too savvy with PHP.
Edit: Table config:
Edit2: Row anomaly with long string > 512 chars:
Edit3: here's my PHP script, if it helps:
<?
include("connect.php");
$data = $_GET['testdata'];
$result = mysql_query("INSERT INTO test (testdata) VALUES ('$data')");
if ($result) // Check result
{
echo $data;
}
else echo "Error ".$mysqli->error;
mysql_close(); ?>
POST is definitely the method you want to use, and your best bet with that will be with cURL. Something like this should work:
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, "http://www.mysite.com/myscript.php" );
curl_setopt( $ch, CURLOPT_POST, TRUE );
curl_setopt( $ch, CURLOPT_POSTFIELDS, $my_really_long_string );
$data = curl_exec( $ch );
You'll need to modify the above to include additional cURL options as per your environment, but something like this is what you'd be looking for.
You'll want to make sure that your DB field is long enough to hold the really long string as well.
Answer 1 Yes, max length of url has restriction. See more:
What is the maximum possible length of a query string?
Answer 2 You can send your string like simple varible ($_POST). Check only settings for max vals of inputing/exectuting in php.ini.
this is just so bang head on wall situation. this pattern works perfectly in javascript. and i have no idea what to do.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://yugioh.wikia.com/wiki/List_of_Yu-Gi-Oh!_BAM_cards');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$chHtml = curl_exec($ch);
curl_close($ch);
$patt = '/<table class="wikitable sortable card-list">[\s\S]*?<\/table/im'; //////////////this
preg_match($patt, $chHtml, $matches);
is the problem line
if i make it greedy
[\s\S]*
it works fine but it goes till the last
There is nothing wrong with the pattern, the problem is that you need a larger backtrack limit than the default.
Explaining:
In regex problems like that always check for errors using the preg_last error().
If you use it in the specific response from the site you submitted, since this is a resource problem and smaller texts do not raise the error, you will see that you are getting a PREG_BACKTRACK_LIMIT_ERROR.
Solution:
To overcome this limit you can raise it with the following in the start of your script:
ini_set ('pcre.backtrack_limit', 10000000);
If I download a file from a website using:
$html = file_get_html($url);
Then how can I know the size, in kilobyes, of the HTML string? I want to know, because I want to skip files over 100Kb.
If you do file_get_contents, you've already gotten the whole file.
If you mean "skip processing", rather than "skip retrieval", you can just get the length of the string: strlen($html). For kilobytes, divide that by 1024.
This is imprecise because the string may contain UTF-8 characters over one byte in length, and very small files will actually occupy a FS block instead of their byte length, but it's probably good enough for the arbitrary-threshold cutoff you're looking for.
To skip fetching large files, you want to use the cURL library.
<?php
function get_content_length($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
$hraw=explode("\r\n",curl_exec($ch));
curl_close($ch);
$hdrs=array();
foreach($hraw as $hdr) {
$a=explode(": ", trim($hdr));
$hdrs[$a[0]]=$a[1];
}
return (isset($hdrs['Content-Length'])) ? $hdrs['Content-Length'] : FALSE;
}
$url="http://www.example.com/";
if (get_content_length($url) < 100000) {
$html = file_get_contents($url);
print "Yes.\n";
} else {
print "No.\n";
}
?>
There may be a more elegant way to pull this information out of curl, but this is what came to mind fastest. YMMV.
Note that setting the CURLOPT options this way makes curl use a "HEAD" rather than "GET" request, so we're not actually fetching this URL twice.
The definition, what a string is, is different between PHP and the intuitive meaning:
"Hällo" (mind the Umlaut) looks like a 5-character String, but to PHP it is really a 6-byte array (assuming UTF8) - PHP doesn't have a notion of a String representing text, it just sees it as a sequence of bytes (The PHP euphemism is "binary safe").
So strlen("Hällo") will be 6 (UTF8).
That said, if you want to skip above 100Kb you probably won't mind if it is 99.5k characters translating to 100k bytes.
file_get_html returns an object to you, the information of how big the string is is lost at that point. Get the string first, the object later:
$html = file_get_contents($url);
echo strlen($html); // size in bytes
$html = str_get_html($html);
You can use mb_strlen to force 8bit or what not and then 1 character = 1 byte
I read some URL with fsockopen() and fread(), and i get this kind of data:
<li
10
></li>
<li
9f
>asd</li>
d
<li
92
Which is totally messed up O_O
--
While using file _ get _ contents() function i get this kind of data:
<li></li>
<li>asd</li>
Which is correct! So, what the HELL is wrong? i tried on my windows server and linux server, both behaves same. And they dont even have the same PHP version.
--
My PHP code is:
$fp = #fsockopen($hostname, 80, $errno, $errstr, 30);
if(!$fp){
return false;
}else{
$out = "GET /$path HTTP/1.1\r\n";
$out .= "Host: $hostname\r\n";
$out .= "Accept-language: en\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
$data = "";
while(!feof($fp)){
$data .= fread($fp, 1024);
}
fclose($fp);
Any help/tips is appreciated, been wondering this whole day now :/
Oh, and i cant use fopen() or file _ get _ contents() because the server where my script runs doesnt have fopen wrappers enabled > __ <
I really want to know how to fix this, just for curiousity. and i dont think i can use any extra libraries on this server anyways.
About your "strange data" problem, this might be because the server you are requesting data from is transferring it in chunked mode.
You can take a look at the HTTP headers, when calling the same URL in your browser ; one of those headers might be like this :
Transfer-encoding: chunked
Quoting wikipedia's article on that matter :
Each non-empty chunk starts with the
number of octets of the data it embeds
(size written in hexadecimal) followed
by a CRLF (carriage return and line
feed), and the data itself. The chunk
is then closed with a CRLF. In some
implementations, white space
characters (0x20) are padded between
chunk-size and the CRLF.
The last chunk is a single line,
simply made of the chunk-size (0),
some optional padding white spaces and
the terminating CRLF. It is not
followed by any data, but optional
trailers can be sent using the same
syntax as the message headers.
The message is finally closed by a
final CRLF combination.
This looks close to what you are getting... So I'm guessing this is the problem.
As far as I remember, curl knows how to deal with that -- so, the easy way would be to use curl instead of fsockopen and the like
And using curl is often a better idea that using sockets : it will deal with many problems you might encounter ; like this one ;-)
Anoter idea, if you don't have curl enabled on your server, would be to use some already existing library based on fsockopen -- hoping it would take care of those kind of things for you already.
For instance, I've worked with Snoopy a couple of times ; maybe it ealready knows how to deal with that ?
(Not sure : you'll have to test by yourself -- or take a look at the documentation to know if this is OK)
Still, if you want to deal with the mysteries of the HTTP protocol by yourself... Well, I wish you luck !
You probably want to use cURL.
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// grab URL and pass it to the browser
$output = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>
With fsockopen(), you get the raw TCP data, not the HTTP contents. I assume you also see the HTTP headers, right? If it's in chunked encoding, you will get all the chunk headers.
This is a known issue. Someone posted a solution here on how to remove chunk headers.