I have a fairly big (Around 50bytes) structure in a Embedded device to be transported to a PHP script which is to be stored in a DB. Most of the elements in the structure are bits and some are integers and chars.
What I plan is use a union in C (Embedded Device) to take the data as a binary array and do a base64 encoding and upload through a URL string as a variable.
Now In php I have a big array which needs to be sepertaed as flags and integers to be store in DB.
This is my task. which will be the suitable method to do the work.
Thanks
If you send as binary, then you will have to decide on and enforce byte order at each end regardless of platform byte order. Enforce network byte order by using the POSIX byte order conversion functions.
Also you cannot rely on compiler generated structure bit-fields to pack and unpack identically on both platforms so for the bit fields you must use shifting-and-masking to pack and unpack and to be careful of bitfields that span byte or word boundaries.
A more easily portable solution perhaps is to transfer using a structured text format such as XML or JSON, or a proprietary format of your own design if these are too verbose.
Structures created in C you can decode in php using unpack() function.
Related
I have a program in labview which is sending UDP packets and using a php program I am receiving those packets. So labview program is sender and php program is receiver.
In labview program, the float array is type casted to string using type cast function block and sent as UDP packets. While receiving those packets using php, I am receiving some data which is not in readable format.
I have tried converting the string array into float array using array_map ('floatval', $array).. But still the values are not coming in the readable format.
Please help me to solve this issue.
The LabVIEW help for Type Cast points you at the document on flattened data which mentions that the representation is big-endian (most significant byte first). The entry on How LabVIEW Stores Data in Memory shows the actual representation of a single-precision floating-point number (SGL):
Now that you know what LabVIEW is sending, your question becomes how to decode this in PHP - if you can't solve this yourself, I suggest asking a new question.
If you can change the LabVIEW code, you could alter the format in which the data is sent so as to make it easier to decode at the other end. Possible options there might include:
If network bandwidth is not an issue, use a standard text-based format such as JSON
If JSON is too big but you can afford eight bytes per value, convert to DBL - using a conversion 'bullet' from the numeric palette - before flattening to string, then reorder the bytes of the string to little-endian at the LabVIEW end. From Ton Plomp's comment, that might be correct for your current PHP code.
If you really can't afford more than four bytes per value, but the range of your data values is not too wide, you could scale them to an integer value (U32 or I32) before flattening; again, that might be easier to decode at the other end.
Note that although the data format you get from Type Cast and/or Flatten to String is documented and historically has been stable, I don't think it's absolutely guaranteed not to change between LabVIEW versions.
Also the unreadable section of data could be header info added by the UDP function. You may be able to parse that data and discard.
Another thing to try is to read the UDP Rx data in Labview and compare to the Tx data to try and identify what is going on.
I need to save 250 data files an hour with 36000 small arrays of [date, float, float, float] in python, that I can read somewhat easily with PHP. This needs to run for 10 years minimum, on 6tb of storage.
What is the best way to save these individual files, I am thinking python struct. But it starts to look bad for the job with large data amounts?
example of data
a = [["2016:04:03 20:30:00", 3.423, 2.123, -23.243], ["2016:23:.....], ......]
Edit:
Space, is more important than unpacking speed and computation. Since the space is very limiting.
So you have 250 data providers of some kind, which are providing 10 samples per second of (float, float, float).
Since you didn't specify what your limitations are, there are more options.
Binary files
You could write files of fixed array of 3*36000 floats with struct, at 4 bytes each gets you at 432.000 bytes per file. You can encode the hour in the directory name and id of the data provider in file name.
If your data isn't too random, a decent compression algorithm should shave enough bytes, but you would probably need to have some sort of delayed compression if you wouldn't want to lose data.
numpy
An alternative to packing with struct is numpy.tofile, which stores the array directly to file. It is fast, but always stores data in C format, where you should take care if the endian is on target machine is different. With numpy.savez_compressed you can store a number of arrays in one npz archive, and also compress it at same time.
JSON, XML, CSV
A good option is any of the mentioned formats. Also worth mentioning is JSON-lines format, where each line is a JSON encoded record. This is to enable streaming writing, where you keep a valid file format after each write.
They are simple to read, and the syntactic overhead goes away with compression. Just don't do string concatenation, use a real serializer library.
(SQL) Database
Seriously, why not use a real database?
Obviously you will need to do something with the data. With 10 samples per second, no human will need so much data, so you will have to do aggregations: minimum, maximum, average, mean, sum, etc. Databases already have all this and with combination of other features they can save you a ton of time you can otherwise spend on writing oh so many scripts and abstractions over files. Not to mention just how cumbersome the file management becomes.
Databases are extensible and supported by many languages. You save a datetime in database with Python, you read datetime with PHP. No hassles with how you are going to have to encode your data.
Databases support indexes for faster lookup.
My personal favourite is PostgreSQL, which has a number of nice features. It supports BRIN index, a lightweight index, perfect for huge datasets with naturally ordered fields, such as timestamps. If you're low on disk, you can extend it with cstore_fdw, a columnar oriented datastore, which supports compression. And if you still want to use flat files, you can write a foreign data wrapper (also possible with Python) and still use SQL to access the data.
Unless you're consuming the files in the same language, avoid language specific formats and structures. Always.
If you're going between 2 or more languages, use a common, plain text data format like JSON or XML that can be easily (often natively) parsed by most languages and tools.
If you follow this advice and your storing plain text, then use compression on the stored file--that's how you conserve space. Typical well-structured JSON tends to compresses really well (assuming simple text content).
Once again, choose a compression format like gzip that's widely supported in by languages or their core libraries. PHP for example has a native function gzopen() and python has lib\gzip.py in the standard python library.
I doubt it is possible without extremely efficient compression.
6TB / 10 year/ 365 days / 24 hrs / 250 files = 270 KB per file.
In ideal case. In real word size of cluster matters.
If you have 36,000 “small arrays” to fit into each file, you have only 7 bytes per array, which is not enough to store even proper datetime object alone.
One idea that comes to my mind if you want to save space. You better store only values and discard timestamps. Produce files with only data and make sure that you created a kind of index (formula) that given a timestamp (year/month/day/hour/min/sec...) results in the position of the data inside of the file (and of course the file that you have to go for). Even, if you check twice you will discover that if you use an "smart" naming scheme for the files you can avoid to store information about year/month/day/hour, since part of the index could be the file name. That all depends on how do you implement your "index" system, but pushing to an extreme version you could forget about timestamps and focus only in data.
Regarding data format, as aforementioned, I would definitively go on language independent format as XML, JSON... Who know which languages and possibilities will you have in ten years ;)
I need to store:
array(1,2,3,4,5);
into a mysql blob.
How can i convert this array into binary data?
It depends mostly on how you are using those informations. IDs are usually used to identify a resource, and thus must be unique, not null and indexable.
By those standarts do not use as blob.
Mostly because search by content is slower than as native variable. Also, SQL databases sort the content of a table to ensure faster queries.
If what you need is just storing information and then using another ID to identify this resource (and they can be easily parsed to strings/numbers then do not use blob). A binary file will usually use 8 bytes per char. A number could contain the same information using less total memory. Example, 1902334123 (random keyboard smash) uses 10*8 = 80 bytes in Hard disk, while an 32-bit signed integer could hold it.
Finally, if what you need is just storing several data units, what is your problem with a sequential varchar to be read as string, as it could solve your problem
you can convert to JSON and store to db:
json_encode($array);
and when you return from db:
json_decode($array);
People know all about storing binary data in database server as BLOBs. How would one accomplish the same thing in PHP?
In other words, how do i store blobs in a php variable?
As PHP doesn't have Unicode support you can safely use normal strings as binary storage. Most (all?) functions are null-safe, too, so you shouldn't get any problems because of that either.
PS: Theoretically you could prefix all binary strings with b (e.g. b'binary data'). This is a forward compatability token to make sure that strings that expect to be handled as binary will really be handled so even than Unicode support is available.
Easy - store it in a string. You can use all the normal string functions (strlen, substr, etc) - just remember that the PHP string functions work in single byte units, e.g. substr($binstr, 0, 1) gives you the first 8 bits of $binstr
Maybe as an array of bytes. After all binary data is nothing more.
I am passing a lot of data between PHP and JavaScript. I am using JSON and json_encode in php, but the problem here is that I am passing a lot of numbers stored as strings - for example, numbers like 1.2345.
Is there a way to pass the data directly as numbers (floats, integers) and not have to convert it to ASCII and then back?
Thanks,
No. HTTP is a byte stream protocol(*); anything that goes down it has to be packed into bytes. You can certainly use a more compact packed binary representation of values if you like, but it's going to be much more work for your PHP to encode and your JS to decode.
Anyhow, for the common case of small numbers, text representations tend to be very efficient. Your example 1.2345 is actually smaller as a string (6 bytes) than a double-precision float (8 bytes).
JSON was invented precisely to allow non-string types to be transferred over the HTTP connection. It's as seamless as you're going to get. Is there any good reason to care that there was a serialise->string->parse step between the PHP float and the JavaScript Number?
(* exposed to JavaScript as a character protocol, since JS has no byte datatype. By setting the charset of the JSON response to iso-8859-1 you can make it work as if it were pure bytes, but the default utf-8 is usually more suitable.)
If you didn't want to use JSON, there are other encoding options. The data returned from an HTTP request is an octect stream (and not 7-bit clean ASCII stream -- if it were, there would be no way to server UTF-8 encoded documents or binary files, as simple counter examples).
Some binary serialization/data protocols are ASN.1, Thrift, Google Protocol Buffers, Avro, or, of course, some custom format. The advantage of JSON is "unified human-readable simplicity".
But in the end -- JSON is JSON.
Perhaps of interest to someone: JavaScript Protocol Buffer Implementation