Best way to encode data - php

I have huge amount of data in my database whith format :
lat;lon;speed;sec:lat;lon;speed;sec......
for example:
53.284534;50.227268;67;0:53.285481;50.226627;68;6:53.286429;50.226042;66;12:.......
format is lattitude, longitude, speed, number of second from beginning.
length of each string is from 1000 to 100000.
I try to compress it before putting in database via gzcompress() and base64_encode().
In case of length of initial string 7607 symbols after gzcompress and base64_encode it will be 3444,
so compression is 50%.
Is any more effective way to compress strings like this?

There is clearly a strong correlation from sample to sample. I would subtract from each sample the previous sample, except of course for the first one. I would encode each difference as an integer of variable length (not as text but in binary). For lat and long I would multiply by 1,000,000 on the assumption (which you need to verify) that there are never more than six digits after the decimal. The second and third samples would each require only six bytes.
Then would I compress with gzip.

Try just storing them as binary floats. This is very simple and it's very fast.
Each number would use 4 bytes and that would make it possible to use them directly from within your code.
Or if you need them more precise, multiply each component by a pre-definied value (which may differ for each component), and store as 32-bit integer words.

Related

PHP rounding my numbers?

I'm doing an API call which is being outputted in JSON,
The product field is "ProductID":3468490060026049912
I convert to PHP, json_decode()
Then I output the "Product ID" = float(3.4684900600261E+18)
It gets changed to a float which is rounded, I input this figure into MYSQL and it stays as the rounded figure.
How do I convert from JSON to PHP without it rounding, I need it correct to 19 digits?
You can use the JSON_BIGINT_AS_STRING flag int he $options parameter of json_decode. You will have a string, though.
You probably don't need to store these IDs as integers; it's not like you're going to do any maths on them.
Store them as strings, and you won't have any issues with precision, no matter how many digits they are.
The only reason you'd need to store these as integers is if you're using them as your primary ID field in the database and its doing auto-increment for new records.
This is also the correct way to handle storage for phone numbers, zip codes, and other data that is formatted as a number but is actually just an arbitrary sequence.
The value is too big to be stored as an integer on 32 bit machines (the maximum is about 2*109), that's why PHP converts it to float.
On 64 bit machines PHP can be stored as integer but its value is really close to the limit (which is about 9*1018). Another ID that starts with 9 or is one digit longer doesn't fit and is converted by PHP to float.
But floating point numbers loose the least significant digits.
As a quick workaround you can pass JSON_BIGINT_AS_STRING as the fourth parameter to json_decode() but it doesn't fixes the source of the problem. You can encounter the same problem on the Javascript or the database code.
You better make "ProductID" a string everywhere. It is an identifier, anyway. You don't need to do math operations with it, just to store it and search for it.
Fixed with setting ini_set('precision',19); at the top.

Compress a binary matrix

I have to send a large set of data containing biological samples which are two-dimensional SQUARE array of 1's and 0's.
Say for example:
[
[0,1,0],
[1,0,0],
[0,0,0]
]
So- this was 3-dimensional. Mine goes 60-70 and expected to go to 120 rows/columns (max). I have to send this via Ajax/API and also store into database.
I could serialise as Json- But I was looking if there was an optimal way to handle things like this. Like, with a series of proper compression/decompression?
One way I could think of it is:
Join the digits as string
Divide in clusters of 6 digits. 111111 bin = 63 dec ( A-Z, a-z, 0-9,_ = 26 + 26 + 10 + 1)
Convert each cluster to alphanumeric encoding (a-zA-Z0-9_) and join as string
How smart/stupid/optimal is this solution? Is something better already out there?
Converting your data structure to JSON, then passing it through a standard compression algorithm like gzdeflate() is about as simple as you can get, and will result in excellent compression ratios. There's probably no reason to make it any more complicated than that.
(The output of gzdeflate will be binary data. If you need to transfer it over a channel which can't deal with that, you can base64_encode it; the results will still be smaller than the original JSON for a matrix of any meaningful size.)
"Flattening" the matrix into a single string of 1s and 0s (and storing the dimensions of the original matrix alongside the string) before compressing it may give you slightly better compression ratios, but at the expense of making your code more complicated.
Performing alphanumeric encoding on the matrix like you're describing in your question will result in significantly worse compression ratios, as it will make it very difficult for the DEFLATE algorithm to detect any patterns in your data which don't "line up" perfectly with the size of your clusters.

Best way to compact a string in PHP that can be decoded to its original form

What would be the best way to compact a string in PHP that can be decoded to its original form. The base64_encode works for numbers but it yields a longer result for strings that contain special characters.
Gzencode and gzdecode use the GZIP compression algorithm and are very efficient on plain text strings. Just be aware that the output may (will) contain binary characters not suitable for display and possibly not suitable for database storage either.
(Edit: singe gzdecode doesn't ship with PHP, consider gzdeflate and gzinflate. Gzdeflate compresses a string and gzinflate decompresses it.)
Take your pick: Compression and Archive Extensions
well of course a base64-encoding makes a string longer as it is mapping all possible bytes onto a smaller set of numbers and alphabetic chars.
I guess convert_uuencode wouldn't increase the size of your binary string as much as base64 b/c the target set is larger.

Storing something as unsigned tinyint in php

Is there a way to explicitly store my numbers in php as tinyint (1 byte instead of 4).
Or could i only enforce this by storing them 4 by 4 in an int? (using a few binary operations)
I generate these values by breaking a string using str_split and interpretting these bytes as ints via unpack( 'C' , .. ).
Currently i store these values in an array as invdividual integers but it could save alot of space if i could store them somehow as tinyints.
PHP has two data types that you may want to use here: integer and string.
PHP doesn't have any other types you could choose from (float wouldn't be a good choice for integers, the other types are not appropriate).
An int is usually 32 or 64 bits, a string is 1 byte per character.* I propose that unless you have a lot of numbers, you won't ever see any problem with 32 bit ints. If you absolutely positively want to safe space memory** and your numbers have a maximum of 3 digits, you could handle your numbers as strings. There's even the BCMath extension that'll let you operate on string numbers directly without needing to cast them back and forth. It's quite a lot of hassle for possibly very limited gain though.
Seeing that a MySQL TINYINT is usually used for boolean values though, please be aware PHP does have a boolean type...!
* One byte per one-byte character, that is.
** Since PHP scripts are usually only very temporary, you should only have problems with peak memory usage, not storage space. Getting more RAM may be the more efficient solution than playing with types.

Storing a bunch of 3bits long binary data with PHP

My PHP program is working with an array of values ranging from 0 to 7. I'm trying to find the most effective way to store those values in PHP. By most effective I mean using the less number of bits.
It's clear that each value only need 3 bits of storage space (b000=0 to b111=7). But what is the most efficient way to store those 3bits values in a binary string ?
I don't know in advance how many 3 bits values I'll need to store or restore, but it might be a lot, so 64bits is clearly not enough.
I was looking into pack() and unpack(): I could store two values in each byte and use a pack('C', $twoValues), but I'm still loosing 2 bits.
Will it work ? Is there a more effective way of storing those values ?
Thanks
You didn't ask if it was a good idea - as many suggested, your benefit of that kind of space compression, is easily lost in the extra processing - but that's another topic :)
You're also not mentioning where you're storing the data after. Whatever that storage location/engine is maybe have further conditions and specialized types (eg a database has a binary column format, might have a byte column format, may even support bit storage etc).
But sticking with the topic, I guess best 3 bit storage is as a nibble (waisting one bit) and I suppose I'd combine two nibbles into a byte (loosing two bits overall). Yes you're loosing two bits (if that's key), but it's simple to combine the two values so you're processing overhead is relatively small:
$byte=$val1*7+$val2;
$val2=$byte%7;$val1=($byte-$val2)/7;
If a byte isn't available, you can combine these up to make 16 (4 stored), 32 (8), 64 (16) bit integers. You can also form an array of these values for larger storage.
I'd consider the above more human readable, but you could also use bit-logic to combine and separate the values:
$combinedbyte=$val1<<3|$val2;
$val2=$combinedbyte&7;$val1=($combinedbyte&56)>>3);
(This is effectively what the PACK/UNPACK commands do)
Alternatively you could encode into characters, since in ASCII the first few are protected, you might as well start at A (A-Z+6 punc+a-z gives you 58 when you only need 49 to store your two values).
$char=chr(($val1*7+$val2)+65); //ord('A')=65
$val2=(ord($char)-65)%7;$val1=(ord($char)-65-$val2)/7;
A series of these encoded characters could be stored as an array or in a null terminated string.
NOTE:
In the case of -say- 64 bit integers above, we're storing 3 bits in 4 so get 64/4=16 storage locations. This means we're waisting 16 further bits (1 per location) so you might be tempted to add another 5 values, for a total of 21 (21*3=63 bits, only 1 wasted). That's certainly possible (with integer math - although most PHP instances don't work # 64 bits, or bit-logic solutions) but it complicates things in the long run - probably more trouble than it's worth.
The best way is to store them as integers and not get involved with packing things bit by bit. Unless you have an actual engineering reason you need these to be stored as 3-bit values (for example, interfacing with hardware), you're just asking for headaches. Keep in mind, esp for odd bit sizes, they become pretty difficult to have direct access to if you do this. And if you are sticking these values in a database, you wouldnt be able to search or index on values packed like this. Store them as integers, or if in a db, perhaps a short integer or byte.
That kind of technique is only necessary if you will have at least half a billion of these. Think about it, the CPU will have to have data in one register, the mask in another and AND them just to get your value out. Now imagine iterating over a list of these that is long enough to justify that kind of space saving technique. A 50% reduction in space and an order of magnitude slower.
Looking at http://php.net/manual/en/language.types.php, you should store them as integers. However, the question is whether to let one integer value represent many 3-bit values or not. The former is more complex but requires less memory, whereas the first is the opposite. If you don't have an extreme need to reduce the amount of memory you use, then I would suggest the latter (use one integer for one 3-bit value).
The main problem with storing many 3-bit values in one integer is figuring out how many 3-bit values there are. You could use an array of integers, and then have an extra integer which states the total number of 3-bit values. However, as also stated in the manual, the number of bits used for an integer value is platform-dependent. So you would have to know whether an integer is 32 bits or 64 bits, or else you may try to store too many values and lose data, or you risk using more memory than needed (which would be a bad thing as you're aiming to use as little memory in the first place).
I would convert each integer to binary, concatenate all of them, and then split the resulting string into bytes. Each byte will be 0-255 so it can be stored as an individual character.

Categories