PHP json_encode slow for big arrays - php

I have some json_encode related issues : i need to use a big array (several 100k items), each with very simple structure (one key, one string value).
json_decode works ok, but as soon as i want to json_encode it, it's awfully slow.
Since i fully control the data here, i tried to write a super simple json encoder, and it's fast.
I'm quite surprised, since my encoding function is crude, and oes not have any of the inner php optimizations that are quite certainly present in json_encode.
Any idea what the problem might be ?
I put my encoder function below for reference.
Thanks
protected function simpleJsonEncoder($data) {
if (is_array($data)) {
$is_indexed = (array_values($data) === $data);
$tab_str = [];
if ($is_indexed) {
foreach($data as $item) {
$str_item = $this->simpleJsonEncoder($item);
$tab_str[] = $str_item;
}
$result = '[' . implode(',', $tab_str) . ']';
}
else {
foreach($data as $index => $item) {
$str_item = $this->simpleJsonEncoder($item);
$tab_str[] = '"' . htmlspecialchars($index, ENT_QUOTES) . '":' . $str_item;
}
$result = '{' . implode(',', $tab_str) . '}';
}
}
else {
$result = '"' . htmlspecialchars($data, ENT_QUOTES) . '"';
}
return $result;
}

For posterity: I've been trying to find alternatives to json_encode for syncing large amounts of data, serialize is way quicker but returns a string much larger in size obviously. I stumbled across this page. I tried out this function - the md5 hash is different from json_encode and the time difference is negligible. From everything I've read recently, they've optimized json_encode somewhere along the line.
I'm on PHP 7.3 and time is in seconds (big object)
"user_func_hash": "xxx",
"user_func_time": 45.33081293106079,
"json_encode_hash": "yyy",
"json_encode_time": 45.759231090545654

Related

Possible to overide a foreach variable parameter within itself?

I have made a small script which uses the Twitch API. The API only allows a maximum of 100 results per query. I would like to have this query carry on until there are no more results.
My theory behind this, is to run a foreach or while loop and increment the offset by 1 each time.
My problem however, is that I cannot change the foreach parameters within itself.
Is there anyway of executing this efficiently without causing an infinite loop?
Here is my current code:
<?php
$newcurrentFollower = 0;
$offset=0;
$i = 100;
$json = json_decode(file_get_contents("https://api.twitch.tv/kraken/channels/greatbritishbg/follows?limit=25&offset=".$offset));
foreach ($json->follows as $follow)
{
echo $follow->user->name . ' (' . $newcurrentFollower . ')' . "<br>";
$newcurrentFollower++;
$offset++;
$json = json_decode(file_get_contents("https://api.twitch.tv/kraken/channels/greatbritishbg/follows?limit=25&offset=".$offset));
}
?>
Using a While loop:
while($i < $total)
{
$json = json_decode(file_get_contents("https://api.twitch.tv/kraken/channels/greatbritishbg/follows?limit=25&offset=".$offset));
echo $json->follows->user->name . ' (' . $newcurrentFollower . ')' . "<br>";
$newcurrentFollower++;
$offset++;
$i++;
}
Ends up echoing this (No names are successfully being grabbed):
Here is the API part for $json->follows:
https://github.com/justintv/Twitch-API/blob/master/v2_resources/channels.md#get-channelschannelfollows
You can use this:
$offset = 0;
$count = 1;
do {
$response = json_decode(file_get_contents(
'https://api.twitch.tv/kraken/channels/greatbritishbg/follows?limit=100&offset=' . $offset
));
foreach($response->follows as $follow) {
echo $follow->user->name . ' (' . ($count++) . ')' . "</br>";
}
$offset+=25;
} while (!empty($response->follows));
You want to use a while loop here, not just a foreach. Basically:
while (the HTTP request returns results)
{
foreach ($json->follows as $follow)
{
do stuff
}
increment offset so the next request returns the next one not already processed
}
The trickiest part is going to be getting the while condition right so that it returns false when the request gets no more results, and will depend on what the API actually returns if there are no more results.
Also important, the cleanest way would be to have the HTTP request occur as part of the while condition, but if you need to do some complicated computation of the JSON return to check the condition, you can put an initial HTTP request before the loop, and then do another request at the end of each while loop iteration.
The problem is you're only capturing the key not the value. Place it into a datastructure to access the information.
Honestly I find a recursive function much more effective than a iterative/loop approach then just update a datatable or list before the next call. It's simple, uses cursors, lightweight and does the job. Reusable if you use generics on it too.
This code will be in c#, however I know with minor changes you'll be able to get it working in php with ease.
query = //follower object get request//
private void doProcessFollowers(string query)
{
HTTPParse followerData = new HTTPParse(); //custom json wrapper. using the basic is fine. Careful with your cons though
var newRoot = followerData.createFollowersRoot(query); // generates a class populated by json
if (newRoot[0]._cursor != null)
{
populateUserDataTable(newRoot); //update dataset
doProcessFollowers(newRoot[0]._links.next); //recurse
}
}
Anyway - This just allows you to roll through the cursors without needing to worry about indexes - unless you specifically want them for whatever reason. If you're working with generics you can just reuse this code without issue. Find a generic example below. All you need to do to make it reuseable is pass the correct class within the <> of the method call. Can work for any custom class that you use to parse json data with. Which is basically what the 'createfollowerroot()' is in the above code, except that's hard typed.
Also I know it's in c# and the topic is php, with a few minor changes to syntax you'll get it working easily.
Anyway Hope this helped somebody
Generic example:
public static List<T> rootSerialize<T>(JsonTextReader reader)
{
List<T> outputData = new List<T>();
while (reader.Read())
{
JsonSerializer serializer = new JsonSerializer();
var tempData = serializer.Deserialize<T>(reader);
outputData.Add(tempData);
}
return outputData;
}

Adding a query string to a random URL

I want to add a query string to a URL, however, the URL format is unpredictable. The URL can be
http://example.com/page/ -> http://example.com/page/?myquery=string
http://example.com/page -> http://example.com/page?myquery=string
http://example.com?p=page -> http://example.com?p=page&myquery=string
These are the URLs I'm thinking of, but it's possible that there are other formats that I'm not aware of.
I'm wondering if there is a standard, library or a common way to do this. I'm using PHP.
Edit: I'm using Cbroe explanation and Passerby code. There is another function by Hamza but I guess it'd be better to use PHP functions and also have cleaner/shorter code.
function addQuery($url,array $query)
{
$cache=parse_url($url,PHP_URL_QUERY);
if(empty($cache)) return $url."?".http_build_query($query);
else return $url."&".http_build_query($query);
}
// test
$test=array("http://example.com/page/","http://example.com/page","http://example.com/?p=page");
print_r(array_map(function($v){
return addQuery($v,array("myquery"=>"string"));
},$test));
Live demo
I'm wondering if there is a standard, library or a common way to do this. I'm using PHP.
Depends on how failsafe – and thereby more complex – you want it to be.
The simplest way would be to look for whether there’s a ? in the URL – if so, append &myquery=string, else append ?myquery=string. This should cover most cases of standards-compliant URLs just fine.
If you want it more complex, you could take the URL apart using parse_url and then parse_str, then add the key myquery with value string to the array the second function returns – and then put it all back together again, using http_build_query for the new query string part.
Some spaghetti Code:
echo addToUrl('http://example.com/page/','myquery', 'string').'<br>';
echo addToUrl('http://example.com/page','myquery', 'string').'<br>';
echo addToUrl('http://example.com/page/wut/?aaa=2','myquery', 'string').'<br>';
echo addToUrl('http://example.com?p=page','myquery', 'string');
function addToUrl($url, $var, $val){
$array = parse_url($url);
if(isset($array['query'])){
parse_str($array['query'], $values);
}
$values[$var] = $val;
unset($array['query']);
$options = '';$c = count($values) - 1;
$i=0;
foreach($values as $k => $v){
if($i == $c){
$options .= $k.'='.$v;
}else{
$options .= $k.'='.$v.'&';
}
$i++;
}
$return = $array['scheme'].'://'.$array['host'].(isset($array['path']) ? $array['path']: '') . '?' . $options;
return $return;
}
Results:
http://example.com/page/?myquery=string
http://example.com/page?myquery=string
http://example.com/page/wut/?aaa=2&myquery=string
http://example.com?p=page&myquery=string
You should try the http_build_query() function, I think that's what you're looking for, and maybe a bit of parse_str(), too.

Format CSV into json to be consumed by ui.autocomplete

I have php problem with formatting output from a CSV to make it available as json for jquery ui.autocomplete.
fruit.csv:
apple, bananna, jackfruit,
... etc
jquery from here:
http://jqueryui.com/demos/autocomplete/#default
$( "#fruits" ).autocomplete({
source: '/path/to/fruit.json'
});
PHP to convert CSV into json:
// Callback function to output CSV as json object
function _custom_json_from_csv() {
$fruit_path = '/path/to/fruit.csv';
$fruits = array_map("str_getcsv", file($fruit_path));
drupal_json_output(array(array_values($fruits)));
exit;
}
// Below are CMS codes for detailed illustration
function drupal_json_output($var = NULL) {
// We are returning JSON, so tell the browser.
drupal_add_http_header('Content-Type', 'application/json');
if (isset($var)) {
echo drupal_json_encode($var);
}
}
function drupal_json_encode($var) {
// The PHP version cannot change within a request.
static $php530;
if (!isset($php530)) {
$php530 = version_compare(PHP_VERSION, '5.3.0', '>=');
}
if ($php530) {
// Encode <, >, ', &, and " using the json_encode() options parameter.
return json_encode($var, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_AMP | JSON_HEX_QUOT);
}
// json_encode() escapes <, >, ', &, and " using its options parameter, but
// does not support this parameter prior to PHP 5.3.0. Use a helper instead.
include_once DRUPAL_ROOT . '/includes/json-encode.inc';
return drupal_json_encode_helper($var);
}
// parts of json-encode.inc - drupal_json_encode_helper(), responsible for json output:
case 'array':
// Arrays in JSON can't be associative. If the array is empty or if it
// has sequential whole number keys starting with 0, it's not associative
// so we can go ahead and convert it as an array.
if (empty($var) || array_keys($var) === range(0, sizeof($var) - 1)) {
$output = array();
foreach ($var as $v) {
$output[] = drupal_json_encode_helper($v);
}
return '[ ' . implode(', ', $output) . ' ]';
}
// Otherwise, fall through to convert the array as an object.
case 'object':
$output = array();
foreach ($var as $k => $v) {
$output[] = drupal_json_encode_helper(strval($k)) . ':' . drupal_json_encode_helper($v);
}
return '{' . implode(', ', $output) . '}';
If there is a solution to directly consume CSV by jquery, that will be great. But no clue by now.
My problem is function _custom_json_from_csv() outputs non-expected format for ui.autocomplete. Note excessive of [[[...]]]:
[[["apple", "bananna", "jackfruit"]]]
While ui.autocomplete wants:
["apple", "bananna", "jackfruit"]
Any direction to format the function as expected by jquery ui.autocomplete?
PS: I don't use #autocomplete_path form API, and instead using ui.autocomplete, for reasons:
1) the code is stored in a theme settings, no hook_menu is available by theme, I want to avoid a module for this need whenever possible.
2) There is a plan somewhere at d.o. to use ui.autocomplete, so consider this adventurous
3) My previous question from jquery viewpoint has led me to instead correct the output of json, rather than making jquery adapt to json.
4) This is more my php issue rather than drupal
Thanks
UPDATE:
Removing one array from drupal_json_output(array(array_values($fruits))); to drupal_json_output(array_values($fruits)); successfully reduced one [] (what is the name of this?). Obviously a miss from previous format with the leading group.
[["apple", "bananna", "jackfruit"]]
I need to remove one more []
I think your code is "correct", if you had:
apple, bananna, jackfruit
peanut, walnut, almont
carrot, potato, pea
then your function would end up as ... etc
[[apple, bananna, jackfruit],[peanut, walnut, almont],[carrot, potato, pea]]
Which appears sensible.
If you only want one row, why can't you just use the result of
$FileContents = file($fruit_path);
$fruits = str_getcsv($FileContents[0]);
as that will turn an array of values in the first row, not an array of arrays of all the rows
Maybe simple is using in js array[0][0]( for drupal_json_output(array(array_values($fruits)));
in php) or array[0] in js( for drupal_json_output(array_values($fruits));) in php

Weird file_get_contents() options

We are currently performing searches on a Database and returning results in JSON format to use on a Google Maps. The file that we call is named getvenues.php and works great on the server. It accepts a number of parameters and returns the results based on the query.
We then have a separate file that checks to see if there's a JSON file on the server which contains the results, matches its age against a setting, and then returns the data either from the cache file, or builds a new cache file if it's too old.
Since there are several thousand possible search options, we only cache single searches (either on a County, Region or Type). The JavaScript always calls our search_cache_builder.php file. If there is more than one search parameter, the file simply gets the contents returned by getvenues.php and serves it up without any caching.
Everything works great except for one particular combination. If a search is run for venue_type=Castle / Fort and venue_name=Leeds Castle, the search_cache_builder.php returns an empty array, even though accessing getvenues.php directly returns the required data.
Here's a sample of the getvenues.php working with this data > http://dev.weddingvenues.com/lincweb/getvenues.php?type=Castle%20/%20Fort&venue_name=Leeds%20Castle
And here's what the search_cache_builder.php script returns with an identical search (the address we are sending to is correct) > http://www.weddingvenues.com/search_cache_builder.php?type=castle%20/%20fort&venue_name=Leeds%20Castle
Here's the code for the search_cache_builder.php file, which relates to this particular query:
$get_string = '?';
foreach($_GET as $key => $value)
{
if($get_string === '?')
{
$get_string .= $key . '=' . $value;
}
else
{
$get_string .= '&' . $key . '=' . $value;
}
}
//$get_string = str_replace(' ', '', $get_string);
// Otherwise, we need to serve up the page as is:
$file_url = GET_VEN_URL . $get_string;
echo file_get_contents($file_url);
Can anyone offer an explanation as to why the search_cache_builder.php file is returning an empty array?
You should urlencode() your parameter values.
In fact, while your getvenues.php receives parameters directly from a browser it behaves OK, 'cause they are correctly urlencoded.
I tried what follows in my computer towards your service and it works:
define ("GET_VEN_URL", "http://www.weddingvenues.com/getvenues.php");
$get_string = '?';
foreach($_GET as $key => $value)
{
if($get_string === '?')
{
$get_string .= $key . '=' . urlencode($value);
}
else
{
$get_string .= '&' . $key . '=' . urlencode($value);
}
}
$file_url = GET_VEN_URL . $get_string;
echo "<pre>";
echo $file_url;
echo "<pre>";
echo file_get_contents($file_url);
Because $get_string === '?' is ALWAYS false, change it to ==, ? is not a boolean.
I would not recommend using GET parameters, &,? as part of the file name.
Beside this, it will quickly hit into 5k (or 4k, can not recall) limit for a file name.
You can do a sort by $GET, md5 (or hash) the array to a random string.
As long you ensure the hashing mechanism is consistent, you can easily retrieve the cache file.
First try to urldecode values, then use http_build_query to generate your get_string, your problem must be the special chars in values (like /).
Edit:
If you change the order it works: http://www.weddingvenues.com/search_cache_builder.php?venue_name=Leeds%20Castle&type=castle%20/%20fort

Building XML with PHP - Performance in mind

When building XML in PHP, is it quicker to build a string, then echo out the string or to use the XML functions that php gives you? Currently I'm doing the following:
UPDATED to better code snippet:
$searchParam = mysql_real_escape_string($_POST['s']);
$search = new Search($searchParam);
if($search->retResult()>0){
$xmlRes = $search->buildXML();
}
else {
$xmlRes = '<status>no results</status>';
}
$xml = "<?xml version=\"1.0\"?>";
$xml.="<results>";
$xml.=$xmlRes;
$xml.="</results>"
header ("content-type: text/xml");
header ("content-length: ".strlen($xml));
echo($xml);
class Search {
private $num;
private $q;
function __construct($s){
$this->q = mysql_query('select * from foo_table where name = "'.$s.'"');
$this->num = mysql_num_rows($this->q);
}
function retResult(){
return $this->num;
}
function buildXML(){
$xml ='<status>success</status>';
$xml.='<items>';
while($row = mysql_fetch_object($this->q)){
$xml.='<item>';
$desTag = '<info><![CDATA[';
foreach ($row as $key => $current){
if($key=='fob'){
//do something with current
$b = mysql_query('select blah from dddd where id ='.$current);
$a = mysql_fetch_array($b);
$xml.='<'.$key.'>'.$a['blah'].'</'.$key.'>';
}
else if($key =='this' || $key=='that'){
$desTag = ' '.$current;
}
else {
$xml.='<'.$key.'>'.$current.'</'.$key.'>';
}
}
$desTag.= ']]></info>';
$xml.=$desTag;
$xml.='</item>';
}
$xml.='</items>';
return $xml;
}
}
Is there a faster way of building the xml? I get to about 2000 items and it starts to slow down..
Thanks in advance!
Use the xml parser. Remember when you concatenate a string, you have to reallocate the WHOLE STRING on every concatenation.
For small strings, string is is probably faster, but in your case definitely use the XML functions.
I don't see that you're making no attempt to escape the text before concatenating it. Which means that sooner or later you're going to generate something that is almost-but-not-quite XML, and which will be rejected by any conforming parser.
Use a library (XMLWriter is probably more performant than others, but I haven't done XML with PHP).
You have a SQL query inside of a loop, which is usually quite a bad idea. Even if each query takes half a millisecond to complete, it's still a whole second just to execute those 2000 queries.
What you need to do is post the two queries in a new question so that someone can show you how to turn them into a single query using a JOIN.
Database stuff usually largely outweighs any kind of micro-optimization. Whether you use string concatenation or XMLWriter doesn't matter when you're executing several thousand queries.
try to echo in each iteration (put the echo $xml before the while loop ends, and reset $xml at the beggining), should be quicker
That code snippet doesn't make a lot of sense, please post some actual code, reduced for readability.
A faster version of the code you posted would be
$xml = '';
while ($row = mysql_fetch_row($result))
{
$xml .= '<items><test>' . implode('</test><test>', $row) . '</test></items>';
}
In general, using mysql_fetch_object() is slightly slower than the other options.
Perhaps what you were trying to do was something like this:
$xml = '<items>';
while ($row = mysql_fetch_assoc($result))
{
$xml .= '<item>';
foreach ($row as $k => $v)
{
$xml .= '<' . $k . '>' . htmlspecialchars($v) . '</' . $v . '>';
}
$xml .= '</item>';
}
$xml .= '</items>';
As mentionned elsewhere, you have to escape the values unless you're 100% sure there will never be any special character such as "<" ">" or "&". This also applies to $k actually. In that kind of script, it is also generally more performant to use XML attributes instead of nodes.
With so little information about your goal, all we can do is micro-optimize. Perhaps you should work on the principles behind your script instead? For instance, do you really have to generate 2000 items. Can you cache the result, can you cache anything? Can't you paginate the result, etc...
Quick word about using PHP's XML libraries, XMLWriter will generally be slightly slower than using string manipulation. Everything else will be noticeably slower than strings.

Categories