distinct empty strings from nulls in a CSV input - php

I have database outputs like the following:
$row = '(one,"pika chu",,"")'
If I send this string as parameter to str_getcsv it will output ['one', 'pika chu', '', '']. The third element despite being absent has been turned into an empty string. This is very annoying since I must recognize empty values (no values) from empty strings. The output I would expect is ['one', 'pika chu', null, ''].
The inputs I get are from a PostgreSQL database and are represented as composite values.
By example, if a table is pokemon_id => int4, name => text then a query will output strings like '(1, "pika chu")'. A unique constraint on the name field by example will allow the following two records to exist: (100, '') and (101, null).
When fetched, they are formatted as raw values like:
'98,whatever'
'99,"pika chu"'
'100,""'
'101,'
'102,","'
I need to read those strings and this example must output the following arrays:
['98', 'whatever']
['99', 'pika chu']
['100', '']
['101', null]
['102', ',']
Is there a way to do that in PHP?
Update 1: #deceze kindly sent me this link stating there are no NULLs in CSV (TL;DR because there were no nulls in XML basically, this problem has been tackled since then.) How to parse CSV with NULLs then?
Update 2: I had propositions to create a dedicated parser in PHP using preg_match_* functions. I am a bit reluctant to go that way because 1) of the performance impact compared to str_getcsv and 2) the fact preg_match used to segfault if the string passed was over 8kb (which can happen in a CSV context).
Update 3: I looked at str_getcsvsource code to see if it was possible to propose a patch to add parsing options like it is in some other languages. I now understand PHP’s underlying philosophy better. #daniel-vérité raised the idea to implement a state machine to parse CSV strings. Even though input can have thousands of lines that weight dozens of kilobytes with embedded CSV structures, it might be the best way.
Thank you for your help.

Related

"encapsulate" a whole part in a html query?

It might sound odd, but I want to send "hierarchical" html-query's == queries that contain queries and sub-queries (to a PHP based system).
The idea is that the first parse_str() will just convert the "outer part" into an array leaving all the "inner party untouched (unlike it is done with %26 that is converted to "&" anywhere).
So, what I search for is kind of a "escape begin / escape end" type of char(s) that make the HTML parser to leave all inside the escape untouched.
Therefor, the "first" parse would deliver an array of queries (and values, if these are not "escaped").
Basically I my ideal query would look like this - where "{" and "}" are the escape begin/end chars:
"key1=abc&query[]={this_is_a_query}&query[]={and_yet_another}"
where {this_is_a_query} would be: "k1=abc&k2=100" and {and_yet_another} would be "k1=xyz&k2=200".
So, fully written:
"key1=abc&query[]={k1=abc&k2=100}&query[]={k1=xyz&k2=200}"
As a result, i would like to get an assoc array that holds "parsable" values that are queries themselfs:
key1=>abc
query[0] => "k1=abc&k2=100"
query[1] => "k1=xyz&k2=200"
I know that I can do that with "%26", but that only works in the "first hierarchy", but not for "queries/in-queries/in-queries" (and so forth)
What I want to achieve is kind of a "batch query" that allows for running multiple programs with one single call.
I hope my description above is understandable?
Sorry, it looks like I did not well express mself. To clearify, I wtry to mak another example, think about parse_str() would have "{}" as chars enclosing what it should not touch:
received string:
step[]={scene[]={dim=10&item=kitchenlamp}&scene[]={item=sprinkler&state=on}}&step[]=delay=20&step[]={scene[]={item=sprinkler&state=off}}
first parse_str would return:
step[0]=>scene[]={dim=10&item=kitchenlamp}&scene[]={item=sprinkler&state=on}
step[1]=>delay=20
step[2]=>scene[]={item=sprinkler&state=off}
My function would now iterate the steps 0..1..2 and hand over the values to the next function that also uses parse_str to aquire it's parameters and so forth.
The sub-function of step 1 would itself get an array and loop it ... apssing the parameters to the "scene" function that itself would dismantle the parameters of what to be done
step 2 would be a direct execution ... wait 10 seconds
step 3 would again get an array of scenes that it would hand over to the scene function.
I hope it's more clear now, what my direction goes to.
Especially that there are same "keys" for some different parts of the "action chain string".
Why I want it this way is the fact that 1.) the sending device has no similar function like http_build_query 2.) the parameters shall be entered by users (not programmers) in an INI-like file.
One way of doing it would be to urlencode the whole "part":
$part="key1=abc&query[]={k1=abc&k2=100}&query[]={k1=xyz&k2=200}";
$href="https://somepage.com?part=".urlencode($part);
// this will result in
// https://somepage.com?part=key1%3Dabc%26query%5B%5D%3D%7Bk1%3Dabc%26k2%3D100%7D%26query%5B%5D%3D%7Bk1%3Dxyz%26k2%3D200%7D
see a little demo here: https://rextester.com/NLBOA61240
Alternatively you could also json_encode() it:
$part=["key1"=>"abc","query"=>[["k1"=>"abc","k2"=>100],["k1"=>"xyz","k2"=>200]]];
$href="https://somepage.com?part=".urlencode(json_encode($part));
On the receiving end you can then easily json_decode() the string you get in $part.
see here: https://rextester.com/UZGV97528

I need to add fields to a JSON string in PHP, but I'm having problems

So... I need to save a large-ish amount of data from a platform with an excruciatingly limited amount of memory.
Because of this, I'm basically storing the data on my webserver, using a php script to just write JSON to a flat file, because I'm lazy af.
I could go to the trouble of having it store the data in my mysql server, but frankly the flat file thing should have been trivial, but I've run up against a problem. There are several quick and dirty workarounds that would fix it, but I've been trying to fix it the "right" way (I know, I know, the right way would be to just store the data in mysql, but I actually need to be able to take the json file this produces and send it back to the platform that needs the data (In a ridiculously roundabout fashion), so it made sense to just have the php save it as a flat file in the first place. And It's already working, aside from this one issue, so I hate to reimpliment.
See... Because of the low memory on the platform I'm sending the json to my server from... I'm sending things one field at a time. Each call to the php script is only setting ONE field.
So basically what I'm doing is loading the file from disk if it exists, and running it through json_decode to get my storage object, and then the php file gets a key argument and a value argument, and if the key is something like "object1,object2", it explodes that, gets the length of the resulting array, and then stores the value in $data->$key[0]->$key[1].
Then it's saved back to disk with fwrite($file, json_encode($data));
This is all working perfectly. Except when $value is a simple string. If it's an array, it works perfectly. If it's a number, it works fine. If it's a string, I get null from json_decode. I have tried every way I can think of to force quotes on to the ends of the $value variable in the hopes of getting json_decode to recognize it. Nothing works.
I've tried setting $data->$key[0]->$key[1] = $value in cases where value is a string, and not an array or number. No dice, php just complains that I'm trying to set an object that doesn't exist. It's fine if I'm using the output of json_decode to set the field, but it simply will not accept a string on its own.
So I have no idea.
Does anyone know how I can either get json_decode to not choke on a string that's just a string, or add a new field to an existing php object without using the output of json_decode?
I'm sure there's something obvious I'm missing. It should be clear I'm no php guru. I've never really used arrays and objects in php, so their vagaries are not something I'm familiar with.
Solutions I'm already aware of, but would prefer to avoid, are: I could have the platform that's sending the post requests wrap single, non-numeric values with square braces, creating a single item array, but this shouldn't be necessary, as far as I'm aware, so doing this bothers me (And ends up costing me something like half a kilobyte of storage that shouldn't need to be used).
I could also change some of my json from objects to arrays in order to get php to let me add items more readily, but it seems like there should be a solution that doesn't require that, so I'd really prefer not to...
I skim through your post.
And I know this works for StdClass :
$yourClass->newField = $string;
Is this what you wanted ?
OK so... ultimately, as succinctly as possible, the problem was this:
Assuming we have this JSON in $data:
{
"key1":
{
"key2":["somedata","someotherdata"]
}
}
And we want it to be:
{
"key1":
{
"key2":["somedata","someotherdata"],
"key3":"key3data"
}
}
The php script has received "key=key1,key3&value=key3data" as its post data, and is initialized thusly:
$key = $_POST["key"];
$key = explode($key,",");
$value = $_POST["value"];
...which provides us with an array ($key) representing the nested json key we want to set as a field, and a variable ($value) holding the value we want to set it to.
Approach #1:
$data->$key[0]->$key[1] = json_decode($value);
...fails. It creates this JSON when we re-encode $data:
{
"key1":
{
"key2":["somedata","someotherdata"],
"key3":null
}
}
Approach #2:
$data->$key[0]->$key[1] = $value;
...also fails. It fails to insert the field into $data at all.
But then I realized... the problem with #2 is that it won't let me set the nonexistent field, and the problem with approach #1 is that it sets the field wrong.
So all I have to do is brute force it thusly:
$data->$key[0]->$key[1] = json_decode($value);
if (json_decode($value) == NULL)
{
$data->$key[0]->$key[1] = $value;
}
This works! Since Approach #1 has created the field (Albeit with the incorrect value), PHP now allows me to set the value of that field without complaint.
It's a very brute force sort of means of fixing the problem, and I'm sure there are better ones, if I understood PHP objects better. But this works, so at least I have my code working.

PHP - Exploding on character(s) that can NEVER be user-defined... How?

Ok, am trying to find a character or group of characters, or something that can be used that I can explode from, since the text is user-defined, I need to be able to explode from a value that I have that can never be within the text.
How can I do this?
An example of what I'm trying to do...
$value = 'text|0||#fd9||right';
Ok,
text is something that should never change in here.
0, again not changeable
#fd9 is a user-defined string that can be anything that the user inputs...
and right sets the orientation (either left or right).
So, the problem I'm facing is this: How to explode("||", $value) so that if there is a || within the user-defined part... Example:
$value = 'text|0||Just some || text in here||right';
So, if the user places the || in the user-defined part of the string, than this messes this up. How to do this no matter what the user inputs into the string? So that it should return the following array:
array('text|0', 'Just some || text in here', 'right');
Should I be using different character(s) to explode from? If so, what can I use that the user will not be able to input into the string, or how can I check for this, and fix it? I probably shouldn't be using || in this case, but what can I use to fix this?
Also, the value will be coming from a string at first, and than from the database afterwards (once saved).
Any Ideas?
The problem of how to represent arbitrary data types as strings always runs up against exactly the problem you're describing and it has been solved in many ways already. This process is called serialization and there are many serialization formats, anything from PHP's native serialize to JSON to XML. All these formats specify how to present complex data structures as strings, including escaping rules for how to use characters that have a special meaning in the serialization format in the serialized values themselves.
From the comments:
Ok, well, basically, it's straight forward. I already outlined 13 of the other parameters and how they work in Dream Portal located here: http://dream-portal.net/topic_122.0.html so, you can see how they fit in. I'm working on a fieldset parameter that basically uses all of these parameters and than some to include multiple parameters into 1. Anyways, hope that link helps you, for an idea of what an XML file looks like for a module: http://dream-portal.net/topic_98.0.html look at the info.xml section, pay attention to the <param> tag in there, at the bottom, 2 of them.
It seems to me that a more sensible use of XML would make this a lot easier. I haven't read the whole thing in detail, but an XML element like
<param name="test_param" type="select">0:opt1;opt2;opt3</param>
would make much more sense written as
<select name="test_param">
<option default>opt1</option>
<option>opt2</option>
<option>opt3</option>
</select>
Each unique configuration option can have its own unique element namespace with custom sub-elements depending on the type of parameter you need to represent. Then there's no need to invent a custom mini-format for each possible parameter. It also allows you to create a formal XML schema (whether this will do you any good or not is a different topic, but at least you're using XML as it was meant to be used).
You can encode any user input to base64 and then use it with explode or however you wish.
print base64_encode("abcdefghijklmnopqrstuvwxyz1234567890`~!##$%^&*()_+-=[];,./?>:}{<");
serialized arrays are also not a bad idea at all. it's probably better than using a comma separated string and explode. Drupal makes good use of serialized arrays.
take a look at the PHP manual on how to use it:
serialize()
unserialize()
EDIT: New Solution
Is it a guarantee that text doesn't contain || itself?
If it doesn't, you can use substr() in combination with strpos() and strrpos() instead of explode
Here's what I usually do to get around this problem.
1) capture user's text and save it in a var $user_text;
2) run an str_replace() on $user_text to replace the characters you want to split by:
//replace with some random string the user would hopefully never enter
$modified = str_replace('||','{%^#',$user_text);
3) now you can safely explode your text using ||
4) now run an str_replace on each part of the explode, to set it back to the original user entered text
foreach($parts as &$part) {
$part = str_replace('{%^#','||',$part);
}

What to use as array_import/var_import for sort-of exported array?

I have a string. It's a user submitted string. (And you should never ever trust user submitted anything.)
If certain (not unsafe) characters exist in the string, it's supposed to become a multi dimensional array/tree. First I tried splits, regex and loops. Too difficult. I've found a very easy solution with a few simple str_replace's and the result is a string that looks like an array definition. Eg:
array('body', array('div', array('x'), array(), array('')), array(array('oele')))
It's a silly array, but it's very easily created. Now that string has to become that array. I'm using eval() for that and I don't like it. Since it's user submitted (and must be able to contain just about anything), there could be any sort of function calls in that string.
So the million dollar question: is there some kind of var_import, or array_import that creates an array from a string and does nothing else (like mysterious, dangerous calls to exec etc)?
Yes, I have tried php.net and neither of the above _import functions exist.
What I'm looking for is the exact opposite of var_import, becasuse the string I have as input, looks exactly like the string var_export would output.
Any other suggestions to make it safer then eval are also welcome! But I'm not abandoning the current method (it's just too simple).
Using
array('body', array('div', array('x'), array(), array('')), array(array('oele')))
as input, I replaced some chars to make it a valid JSON string and imported that via json_decode.
Works perfectly. If some illegal chars are present, json_decode will trip over them (and not execute any dangerous code).

Unify variable types of array elements

After hours of debugging, I found an error in one of my scripts. For saving different event types in a database, I have an array of unique data for each event that can be used to identify the event.
So I basically have some code like
$key = md5(json_encode($data));
to generate a unique key for each event.
Now, in some cases, a value in the $data array is an integer, sometimes a string (depending on where it comes from - database or URL). That causes the outputs of json_encode() to be different from each other, though - once including quotes, once not.
Does anybody know a way to "unify" the variable types in the $data array? That would probably mean converting all strings that only contain an integer value to integer. Anything else I have to take care of when using json_encode()?
array_walk_recursive combined with a function you have written to the effect of maybe_intval which performs the conversion you talk about on a single element.
EDIT: having read the documentation for array_walk_recursive more closely you'll actually want to write your own recursive function
function to_json($obj){
if(is_object($obj))
$obj=(array)$obj;
if(is_array($obj))
return array_map('to_json',$obj);
return "$obj"; // or return is_int($obj)?intval($obj):$obj;
}

Categories