Compress small string by converting it's base in PHP? - php

I was wondering if there would be some way to compress a small ASCII string (~100 characters) by combining some of the native PHP compression and base converting functions to produce an even smaller string (`~60 characters).
For example, could I take a string, gzcompress it, convert it to a number, and then change the base to a system with more values?
The goal is to have a smaller string that is ASCII (perhaps UTF-8) compatible for display.

You could try a dictionary compression like lzw or a golomb code but the compression depends on the data. Without the exact data it's not possible to answer the question.

base64_encode(gzcompress($input));
That should do it, but I don't think this will make your original string much smaller.
http://php.net/manual/en/function.base64-encode.php
http://php.net/manual/en/function.gzcompress.php

Related

Optimizing conversion to ascii

I want to convert all characters in a string to ascii codes, and right now I do this with ord() function - but it is pretty slow. Is there another, faster way to do this?
I have ~100GB of text on which I'd have to use this convertion, so this little difference matters a lot.
I've been thinking about creating a map of ascii characters and using it instead, but I'm not sure if it will be faster, and I couldn't find ascii map anywhere.

Are AJAX Posts 8 bit Clean? / Relation to Base64 / An alternative? / Where is it?

Base64 only uses 6 bits per character (2^6 = 64) to create textual data from image files. This causes an in-efficiency.
According to a wikipedia entry on Base64, this in-efficiency is to protect against 8 bit dirty things like email.
Is Ajax Posting 8 bit clean? If so, is there an alternative to using Base64?
php.net ( as does wikipedia ) claims a 33% in-efficiency for base64_encode..
Kind of. All JavaScript strings are UTF-16, not byte strings. If you're sending the data with send, then it will be encoded into UTF-8 before it is sent. As such, you can convert the bytes into Unicode code points, which will then be encoded into UTF-8. When it reaches the server, you'll have to decode the UTF-8 and then convert the code points back into bytes.
For 7-bit data, this will not expand the size of the data at all. For 8-bit data with the most significant bit always set, it will double the size of the data. For 8-bit data with the most significant bit set half of the time, it will increase the size of your data by 50%, which is worse than the Base64 33.3Íž% increase.
On the other hand, using XMLHttpRequest Level 2 will allows you to send binary data by passing send an ArrayBuffer, Blob, or FormData. However, XMLHttpRequest Level 2 is only supported in newer browsers.
I think AJAX posting is the same as a generic POST requests in that aspect; that's why we need 'multipart/form-data' for sending files' content, for example. Usually the data gets url encoded, but Base64 is perhaps a better way, as it's (generally) more efficient.
UPDATE: It might be helpful to look at this the other way. ) You need some stream of values, that might possibly take all 8 bits, to safely pass the 7-bit filtering. The perfect solution is to use '7-to-8' encoding, so each 7 bytes become 8 'safe' characters. But this is not applicable, as some of these 7-bit characters are actually used to specify some additional (meta) information about the stream...
Now you have a dilemma: either use the next integer (6 bit - that is base64) - or try to invent a scheme with 'non-integer' divider. Such schemes exist (check Ascii85, for example), but they are rarely used.

Best way to compact a string in PHP that can be decoded to its original form

What would be the best way to compact a string in PHP that can be decoded to its original form. The base64_encode works for numbers but it yields a longer result for strings that contain special characters.
Gzencode and gzdecode use the GZIP compression algorithm and are very efficient on plain text strings. Just be aware that the output may (will) contain binary characters not suitable for display and possibly not suitable for database storage either.
(Edit: singe gzdecode doesn't ship with PHP, consider gzdeflate and gzinflate. Gzdeflate compresses a string and gzinflate decompresses it.)
Take your pick: Compression and Archive Extensions
well of course a base64-encoding makes a string longer as it is mapping all possible bytes onto a smaller set of numbers and alphabetic chars.
I guess convert_uuencode wouldn't increase the size of your binary string as much as base64 b/c the target set is larger.

How to store binary data in PHP

People know all about storing binary data in database server as BLOBs. How would one accomplish the same thing in PHP?
In other words, how do i store blobs in a php variable?
As PHP doesn't have Unicode support you can safely use normal strings as binary storage. Most (all?) functions are null-safe, too, so you shouldn't get any problems because of that either.
PS: Theoretically you could prefix all binary strings with b (e.g. b'binary data'). This is a forward compatability token to make sure that strings that expect to be handled as binary will really be handled so even than Unicode support is available.
Easy - store it in a string. You can use all the normal string functions (strlen, substr, etc) - just remember that the PHP string functions work in single byte units, e.g. substr($binstr, 0, 1) gives you the first 8 bits of $binstr
Maybe as an array of bytes. After all binary data is nothing more.

PHP <-> JavaScript communication: Am I stuck with ASCII?

I am passing a lot of data between PHP and JavaScript. I am using JSON and json_encode in php, but the problem here is that I am passing a lot of numbers stored as strings - for example, numbers like 1.2345.
Is there a way to pass the data directly as numbers (floats, integers) and not have to convert it to ASCII and then back?
Thanks,
No. HTTP is a byte stream protocol(*); anything that goes down it has to be packed into bytes. You can certainly use a more compact packed binary representation of values if you like, but it's going to be much more work for your PHP to encode and your JS to decode.
Anyhow, for the common case of small numbers, text representations tend to be very efficient. Your example 1.2345 is actually smaller as a string (6 bytes) than a double-precision float (8 bytes).
JSON was invented precisely to allow non-string types to be transferred over the HTTP connection. It's as seamless as you're going to get. Is there any good reason to care that there was a serialise->string->parse step between the PHP float and the JavaScript Number?
(* exposed to JavaScript as a character protocol, since JS has no byte datatype. By setting the charset of the JSON response to iso-8859-1 you can make it work as if it were pure bytes, but the default utf-8 is usually more suitable.)
If you didn't want to use JSON, there are other encoding options. The data returned from an HTTP request is an octect stream (and not 7-bit clean ASCII stream -- if it were, there would be no way to server UTF-8 encoded documents or binary files, as simple counter examples).
Some binary serialization/data protocols are ASN.1, Thrift, Google Protocol Buffers, Avro, or, of course, some custom format. The advantage of JSON is "unified human-readable simplicity".
But in the end -- JSON is JSON.
Perhaps of interest to someone: JavaScript Protocol Buffer Implementation

Categories