How big is a string in PHP? - php

If I download a file from a website using:
$html = file_get_html($url);
Then how can I know the size, in kilobyes, of the HTML string? I want to know, because I want to skip files over 100Kb.

If you do file_get_contents, you've already gotten the whole file.
If you mean "skip processing", rather than "skip retrieval", you can just get the length of the string: strlen($html). For kilobytes, divide that by 1024.
This is imprecise because the string may contain UTF-8 characters over one byte in length, and very small files will actually occupy a FS block instead of their byte length, but it's probably good enough for the arbitrary-threshold cutoff you're looking for.

To skip fetching large files, you want to use the cURL library.
<?php
function get_content_length($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
$hraw=explode("\r\n",curl_exec($ch));
curl_close($ch);
$hdrs=array();
foreach($hraw as $hdr) {
$a=explode(": ", trim($hdr));
$hdrs[$a[0]]=$a[1];
}
return (isset($hdrs['Content-Length'])) ? $hdrs['Content-Length'] : FALSE;
}
$url="http://www.example.com/";
if (get_content_length($url) < 100000) {
$html = file_get_contents($url);
print "Yes.\n";
} else {
print "No.\n";
}
?>
There may be a more elegant way to pull this information out of curl, but this is what came to mind fastest. YMMV.
Note that setting the CURLOPT options this way makes curl use a "HEAD" rather than "GET" request, so we're not actually fetching this URL twice.

The definition, what a string is, is different between PHP and the intuitive meaning:
"Hällo" (mind the Umlaut) looks like a 5-character String, but to PHP it is really a 6-byte array (assuming UTF8) - PHP doesn't have a notion of a String representing text, it just sees it as a sequence of bytes (The PHP euphemism is "binary safe").
So strlen("Hällo") will be 6 (UTF8).
That said, if you want to skip above 100Kb you probably won't mind if it is 99.5k characters translating to 100k bytes.

file_get_html returns an object to you, the information of how big the string is is lost at that point. Get the string first, the object later:
$html = file_get_contents($url);
echo strlen($html); // size in bytes
$html = str_get_html($html);

You can use mb_strlen to force 8bit or what not and then 1 character = 1 byte

Related

Get Zipfile as Bytes in php?

Using php, how can I read a zip file and get its bytes, for example something like
$contents = file_get_contents('myzipfile.zip');
echo $contents;
// outputs: 504b 0304 1400 0000 0800 1bae 2f46 20e0
Thank you!
file_get_contents gets the raw bytes, your echo outputs those raw bytes. If you expect to output a hexadecimal notation of the raw byte contents instead, use bin2hex:
echo bin2hex($contents);
If you want that arbitrarily grouped with a space every two bytes, you can do something along these lines:
echo join(' ', str_split(bin2hex($contents), 4));
(Note that this is all rather inefficient, modifying the entire, possibly many megabyte large file in memory. I'm expecting this is just for debugging purposes, so won't go out of my way to write super efficient code.)
file_get_contents() will return the exact contents of the file, so the format depends on the file type.
If you are looking for the byte size of the file you can get any available file information with the core SPL library's fileInfo class:
$info = new SplFileInfo('myzipfile.zip');
$bytes = $info->getSize();

How do i post a long string into a PHP page?

This question has two parts:
Part I - restriction?
I'm able to store data to my DB with this:
www.mysite.com/myscript.php?testdata=abc123
This works for a short string (eg 'abc123') and the page echos what was written to the DB; however, if the [testdata=] string is longer than 512 chars and i check the database, it shows a row has been added but it's blank and also my echo statement in the script doesn't display the input string.
N.B. I'm on a shared server and have emailed my host to see if it's a restriction.
Part II - best practice?
If i can get past the above hurdle, I want to use a string that's ~15k chars long created in a desktop app that concatenates the [testdata=] string from various parameters; what's the best way to send a long string in PHP POST?
Thanks in advance for your help, i'm not too savvy with PHP.
Edit: Table config:
Edit2: Row anomaly with long string > 512 chars:
Edit3: here's my PHP script, if it helps:
<?
include("connect.php");
$data = $_GET['testdata'];
$result = mysql_query("INSERT INTO test (testdata) VALUES ('$data')");
if ($result) // Check result
{
echo $data;
}
else echo "Error ".$mysqli->error;
mysql_close(); ?>
POST is definitely the method you want to use, and your best bet with that will be with cURL. Something like this should work:
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, "http://www.mysite.com/myscript.php" );
curl_setopt( $ch, CURLOPT_POST, TRUE );
curl_setopt( $ch, CURLOPT_POSTFIELDS, $my_really_long_string );
$data = curl_exec( $ch );
You'll need to modify the above to include additional cURL options as per your environment, but something like this is what you'd be looking for.
You'll want to make sure that your DB field is long enough to hold the really long string as well.
Answer 1 Yes, max length of url has restriction. See more:
What is the maximum possible length of a query string?
Answer 2 You can send your string like simple varible ($_POST). Check only settings for max vals of inputing/exectuting in php.ini.

Parsing large file using php linux server

I am a php programmer and currently I am working with files. I have to parse and insert the data to mysql database. Since its large amount of data php unable to load or parse the file. I am getting memory leak error even though I have increased memory_limit upto 1500MB.
FATAL: emalloc(): Unable to allocate 456185835 bytes
my text file contains text and xml data. I have to parse the xml data from the text file.
eg: <ajax>some text goes here</ajax> non relativ text <ajax>other content</ajax>
In the above example I have to parse the content inside tag. If any one can give some advice to separate each tag into individual file(eg: 1.txt, 2.txt), it will be great(perl or c or shell scripting..etc ).
Cough... a 1500 MB memory limit is a sure sign you have gone off the rails.
Where are you getting your file? I assume (given the size) that this is a local file. If you are trying to load the file into a string using file_get_contents() it is worth noting that the docs are wrong and that said function does not in fact using memory-mapped I/O (cf. bug 52802). So this is not going to work for you.
What you might try is instead falling back to more C-like (but still PHP) constructs, in particular fopen(), fseek(), and fread(). If the file is of a known structure with newlines, you might also consider fgets().
These should allow you to read in bytes in chunks into a reasonable size buffer from which you can do your processing. Since it looks like you are processing tagged strings, you will have to play the usual games of keeping multiple buffers around in which you can accumulate data until processable. This is fairly standard stuff covered in most introductions to, e.g., stream processing in C.
Note that in PHP (or any other language for that matter), you are also going to have to potentially consider issues of string encoding because, in general, it is no longer the case that 1 byte == 1 character (cf. Unicode).
As you insinuate, PHP may well not be the best language for this task (though it certainly can do it). But your problem isn't really a language-specific one; you are running into a fundamental limitation of handling large files without memory-mapping.
you can actually parse XML with PHP a small block at a time so you dont actually require much ram at all:
set_time_limit(0);
define('__BUFFER_SIZE__', 131072);
define('__XML_FILE__', 'pf_1360591.xml');
function elementStart($p, $n, $a) {
//handle opening of elements
}
function elementEnd($p, $n) {
//handle closing of elements
}
function elementData($p, $d) {
//handle cdata in elements
}
$xml = xml_parser_create();
xml_parser_set_option($xml, XML_OPTION_TARGET_ENCODING, 'UTF-8');
xml_parser_set_option($xml, XML_OPTION_CASE_FOLDING, 0);
xml_parser_set_option($xml, XML_OPTION_SKIP_WHITE, 1);
xml_set_element_handler($xml, 'elementStart', 'elementEnd');
xml_set_character_data_handler($xml, 'elementData');
$f = fopen(__XML_FILE__, 'r');
if($f) {
while(!feof($f)) {
$content = fread($f, __BUFFER_SIZE__);
xml_parse($xml, $content, feof($f));
unset($content);
}
fclose($f);
}

Using file_get_contents to download a portion of data?

Is it possible to use file_get_contents() to download a portion of data. For example, if I'm downloading a text file that is 2MB, and I only want the first 5 bytes, is this possible?
Sure. The additional arguments allow you to specify a portion of the file. See example #3 on the manual page:
<?php
// Read 14 characters starting from the 21st character
$section = file_get_contents('./people.txt', NULL, NULL, 20, 14);
var_dump($section);
?>
Here, the last two arguments limit the amount of data returned to just the portion of interest.
Note: The offset argument is a little unpredictable with remote files, as stated also on the manual page:
Seeking (offset) is not supported with remote files. Attempting to seek on non-local files may work with small offsets, but this is unpredictable because it works on the buffered stream.
function ranger($url, $bytes){
$headers = array(
"Range: bytes=0-".$bytes
);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($curl);
}
$url = "http://example.com/textfile.txt";
$raw = ranger($url, 5);
echo $raw;
Keep in mind that Range header must be supported by server. With fgc I think it is impossibru, even if it is, you should use cURL.

PHP fsockopen() / fread() returns messed up data

I read some URL with fsockopen() and fread(), and i get this kind of data:
<li
10
></li>
<li
9f
>asd</li>
d
<li
92
Which is totally messed up O_O
--
While using file _ get _ contents() function i get this kind of data:
<li></li>
<li>asd</li>
Which is correct! So, what the HELL is wrong? i tried on my windows server and linux server, both behaves same. And they dont even have the same PHP version.
--
My PHP code is:
$fp = #fsockopen($hostname, 80, $errno, $errstr, 30);
if(!$fp){
return false;
}else{
$out = "GET /$path HTTP/1.1\r\n";
$out .= "Host: $hostname\r\n";
$out .= "Accept-language: en\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
$data = "";
while(!feof($fp)){
$data .= fread($fp, 1024);
}
fclose($fp);
Any help/tips is appreciated, been wondering this whole day now :/
Oh, and i cant use fopen() or file _ get _ contents() because the server where my script runs doesnt have fopen wrappers enabled > __ <
I really want to know how to fix this, just for curiousity. and i dont think i can use any extra libraries on this server anyways.
About your "strange data" problem, this might be because the server you are requesting data from is transferring it in chunked mode.
You can take a look at the HTTP headers, when calling the same URL in your browser ; one of those headers might be like this :
Transfer-encoding: chunked
Quoting wikipedia's article on that matter :
Each non-empty chunk starts with the
number of octets of the data it embeds
(size written in hexadecimal) followed
by a CRLF (carriage return and line
feed), and the data itself. The chunk
is then closed with a CRLF. In some
implementations, white space
characters (0x20) are padded between
chunk-size and the CRLF.
The last chunk is a single line,
simply made of the chunk-size (0),
some optional padding white spaces and
the terminating CRLF. It is not
followed by any data, but optional
trailers can be sent using the same
syntax as the message headers.
The message is finally closed by a
final CRLF combination.
This looks close to what you are getting... So I'm guessing this is the problem.
As far as I remember, curl knows how to deal with that -- so, the easy way would be to use curl instead of fsockopen and the like
And using curl is often a better idea that using sockets : it will deal with many problems you might encounter ; like this one ;-)
Anoter idea, if you don't have curl enabled on your server, would be to use some already existing library based on fsockopen -- hoping it would take care of those kind of things for you already.
For instance, I've worked with Snoopy a couple of times ; maybe it ealready knows how to deal with that ?
(Not sure : you'll have to test by yourself -- or take a look at the documentation to know if this is OK)
Still, if you want to deal with the mysteries of the HTTP protocol by yourself... Well, I wish you luck !
You probably want to use cURL.
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// grab URL and pass it to the browser
$output = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>
With fsockopen(), you get the raw TCP data, not the HTTP contents. I assume you also see the HTTP headers, right? If it's in chunked encoding, you will get all the chunk headers.
This is a known issue. Someone posted a solution here on how to remove chunk headers.

Categories