How to organize JSON for faster info access? - php

I've been experimenting with storing Vehicle info as JSON for a quicker way to access the vehicle images.
I have set up a table in my DB for JSON. The JSON is set up as shown below. All the vehicles are in this one JSON, along with all their image information. I'm not sure if this is the best way of storing the data. I want to be able to quickly search based on the VIN to get only the images associated with that VIN.
My Issues:
The loading speed of dynamically displaying associated vehicle images.
Not getting ONLY the associated images; other vehicle images are showing
Not sure if my JSON format makes sense (or is inefficient), haven't worked too much with JSON
Is there an easier way to set up the SQL table for querying a specific JSON?
Possible Solutions (not limited to one):
Re-format JSON for simple referencing
Re-format Table for easier queries
Edit "code loops" for faster run time
I had previously set up loops on the InventoryPage, this is a dynamic page that uses $_GET to get the associated VIN, to iterate through my Database to get the images associated with the VIN. This worked, but took too long due to the amount of iterations required.
There are usually much more entries than this, I trimmed it way back for easier readability. We have an average of 100 vehicles with 20-60 images per vehicle.
Here is an example of my JSON format:
[{"vin": "JF1GR89658L827860", "images": [{"image":
"https://cdn04.carsforsale.com/3/420970/23594569/1133776572.jpg?
dt=100320180034", "width": 800, "height": 600}, {"image":
"https://cdn04.carsforsale.com/3/420970/23594569/1133776606.jpg?
dt=100320180034", "width": 800, "height": 600}]},
{"vin": "6Y86G433753", "images": [{"image":
"https://cdn04.carsforsale.com/3/420970/23684711/1135715340.jpg?
dt=100620180134", "width": 800, "height": 600}, {"image":
"https://cdn04.carsforsale.com/3/420970/23684711/1135715371.jpg?
dt=100620180134", "width": 800, "height": 600}]}]
The code I currently have to iterate through the JSON and find the associated images, which I think incorrectly displays images from different vehicles:
foreach ($vehicles as $vehicle)
{
if ($vehicle['vin'] === $vin) {
$last_element = ',';
while ($vehicle['images']) {
echo "{";
echo "src: '" . $vehicle['images'][0]['image'] . "',";
echo "h: " . $vehicle['images'][0]['height'] . ",";
echo "w: " . $vehicle['images'][0]['width'];
echo "}" . $last_element;
}
break;
}
}
Expected output from above "code loops" (for image slider):
{
src: "image_link",
h: vehicle_height,
w: vehicle_width
}

Don't try to create JSON by hand. Construct an array with all the data, then call json_encode().
$json_array = array();
foreach ($vehicles as $vehicle)
{
if ($vehicle['vin'] = $vin) {
foreach ($vehicle['images'] as $image) {
$json_array[] = ['src' => $image['image'], 'h' => $image['height'], 'w' => $image['width']];
}
break;
}
}
echo json_encode($json_array);
You might also consider making the JSON column an object rather than an array, with vin as the keys. Then you wouldn't need a loop, you could just use $vehicle[$vin]. You could also use JSON_SEARCH() in your MySQL query to just extract that element instead of the whole array.

This is kind of an opinionated answer, but I really don't see the benefit of using a JSON column for this. I think it looks like the data you have is a better fit for traditional related tables. I think JSON columns are great for storing more unstructured data, sets of different objects with different properties. But you have a set of similar objects, each a flat list with exactly the same properties. Image records would fit perfectly in their own table, possibly with a relationship to a different table with other VIN-related data, and I think retrieving related data would be simpler and more efficient that way. Just my two cents, and I know it doesn't really answer the question directly, but it's a bit much for a comment.

Related

Convert Value in Large JSON File to CSV

First of all, I appreciate there are lots of answers regarding dealing with large JSON files. However, I have yet to find one that encounters my scenario.
The problem I face is that I have large JSON files (12mb) that look like this:
{
"range": "Sheet1!A1:P40571",
"majorDimension": "ROWS",
"values": [
[
"new_id",
"qty",
"total_job_cost",
"total_job_revenue",
"total_job_profit",
"total_job_margin"
],
[
"34244",
"5",
"211.25",
"297.00",
"85.75",
"28.87%"
],
[
"34244",
"10",
"211.25",
"297.00",
"85.75",
"28.87%"
],
...
]
}
And I wish to extract out the values array, and convert it into a csv that would like this:
new_id,total_job_cost,total_job_revenue,total_job_profit,total_job_margin
34244,211.25,297.00,85.75,28.87%
34245,211.25,297.00,85.75,28.87%
...
However, since the values array is so large, when I try to extract it using a PHP library for JSON parsing, my server crashes when it tries to read it.
Any suggestions or tips appreciated. Thanks.
You can't read json line by line,but not with any built in libraries. I wrote a simple Json parser for another answer here
Convert structure to PHP array
I had to make a slight modification to handle "real" json" In the switch change this token
case 'T_ENCAP_STRING':
if( $mode == 'key'){
$key .= trim($content,'"');
}else{
value .= unicode_decode($content); //encapsulated strings are always content
}
next($lexer_stream);//consume a token
break;
You can test it here
http://sandbox.onlinephpfunctions.com/code/b2917e4bb8ef847df97edbf0bb8f415a10d13c9f
and find the full (updated) code here
https://github.com/ArtisticPhoenix/MISC/blob/master/JasonDecoder.php
Can't guarantee it will work but it's worth a shot. It should be fairly easy to modify it to read your file.
If the problem is simply to convert the large JSON file to a CSV file, then perhaps a jq solution is admissible. Depending on the computing environment, jq can generally handle large files (GB) breezily, and with a little more effort, it can usually handle even larger files as it has a "streaming parser".
In any case, here is a jq solution to the problem as stated:
jq -r '(.values[] | [.[0,2,3,4,5]]) | #csv' data.json > extract.csv
For the sample input, this produces:
"new_id","total_job_cost","total_job_revenue","total_job_profit","total_job_margin"
"34244","211.25","297.00","85.75","28.87%"
"34244","211.25","297.00","85.75","28.87%"
This is valid CSV, and the use of #csv guarantees the result, if any, will be valid CSV, but if you want the quotation marks removed, there are several options, though whether they are "safe" or not will depend on the data. Here is an alternative jq solution that produces comma-separated values. It uses join(",") instead of #csv:
(.values[] | [.[0,2,3,4,5]]) | join(",")

Is it possible to setCellValue directly without the need of a multidimentional array?

I currently started working on a script that had already been done. It works great, but it takes way too long and I believe I know the cause but I have had no success in improving it.
The case is as follows, the script reads a XML file with a lot of info regarding temperatures, all of which are inside the various <Previsao> tags inside the xml.
$l = 3;
$q = $CON->Query("SELECT
cod_cidade,
cidade,
cidcompleta
FROM
listabrasil
WHERE
cidade LIKE '%Aj%'
ORDER BY
cidade ASC");
while ($x = $CON->Fetch ($q))
{
$XML = simplexml_load_file('http://ws.somarmeteorologia.com.br/previsao.php?Cidade='.$x['cidade'].'&DiasPrevisao=15');
print $x['cidade']."\n";
foreach ($XML->Previsao as $P)
{
$Previsao[$x['cidade']]['data'][] = (string)$P->Data;
$Previsao[$x['cidade']]['tmin'][] = (float) $P->Tmin;
$Previsao[$x['cidade']]['tmax'][] = (float) $P->Tmax;
$Previsao[$x['cidade']]['prec'][] = (float) $P->Prec;
$Previsao[$x['cidade']]['velvento'][] = (float) $P->VelVento;
$Previsao[$x['cidade']]['dirvento'][] = (string)$P->DirVento;
}
}
foreach ($Previsao as $Cid => $Dados)
{
$col = 1;
for($dias = 0; $dias < 15 ; $dias++)
{
$PlanilhaBloomberg->setCellValue($colunas[$col+0].'2', $Dados['data'][$dias]);
$PlanilhaBloomberg->setCellValue($colunas[$col+0].$l, $Dados['tmin'][$dias].'C');
$PlanilhaBloomberg->setCellValue($colunas[$col+1].$l, $Dados['tmax'][$dias].'C');
$PlanilhaBloomberg->setCellValue($colunas[$col+2].$l, $Dados['prec'][$dias].'mm');
$PlanilhaBloomberg->setCellValue($colunas[$col+3].$l, $Dados['velvento'][$dias].'km/h');
$PlanilhaBloomberg->setCellValue($colunas[$col+4].$l, $Dados['dirvento'][$dias]);
print $Dados['data'][$dias]."\n";
print $Dados['tmin'][$dias]."\n";
print $Dados['tmax'][$dias]."\n";
print $Dados['prec'][$dias]."\n";
print $Dados['velcento'][$dias]."\n";
print $Dados['dirvento'][$dias]."\n";
$col = $col + 5;
}
$l++;
}
Don't worry about the setCellValue, it's just from the PHPExcel library. So, from what I could gather, it's taking so long to execute due to obviously the large amount of data that it's gathering from the XML, but also because it keeps filling the multidimensional array $Previsao ... What I am hoping to achieve (with no success, might I add) is to fill the setCellValue directly, without the need for a multidimensional array. Do you guys think it's possible, if it is, would this reduce the exec_time for the script?
Thank you all in advance for the help, and also please forgive me if this question is too focused, not sure if this could cause problems.
PHPExcel usually does take a long time to setCellValues. Their might be an optimal solution to better of your code but i don't think it would make an excessive difference when it comes to runtime. Try to setPreCalculateFormulas(False) before saving your file so that PHPExcel doesn't calculate values on save. That might save some time. Second of all, assuming $PlanilhaBloomberg is $object->getActiveSheet(), you can call it as such
$PlanilhaBloomberg->setCellValue($colunas[$col+0].'2', $Dados['data'][$dias]);
->setCellValue($colunas[$col+0].$l, $Dados['tmin'][$dias].'C');
->setCellValue($colunas[$col+1].$l, $Dados['tmax'][$dias].'C');
->setCellValue($colunas[$col+2].$l, $Dados['prec'][$dias].'mm');
->setCellValue($colunas[$col+3].$l, $Dados['velvento'][$dias].'km/h');
->setCellValue($colunas[$col+4].$l, $Dados['dirvento'][$dias]);
That might help.
Well you are iterating over the database result, iterating over the xml for each database result, in order to build a big array in memory; then iterating over that array to build the Excel sheet.... surely it should be possible to avoid building the big array; arrays are memory expensive, and a lot of time overhead is spent in allocating the memory for that array as well.
It should be possible too populate the Excel directly from the XML loop, avoiding the entire array building (which saves memory and memory allocation time) and saves an entire loop as well.
It's really just a case of identifying which cell needs populating from which XML value; but I can't envisage the array structure to work that out

PHP turn data list into an array

So I have following link which contains a lot of data (a lot). http://www.gw2spidy.com/api/v0.9/json/all-items/all
The data is a list of ~50,000 sets of data, separated by commas.
Is there anyway I can convert this some 50,000 entries of data to an array?
Here is one entry of data for reference.
{
"data_id":2,
"name":"Assassin Pill",
"rarity":1,
"restriction_level":0,
"img":"https:\/\/render.guildwars2.com\/file\/ED903431B97968C79AEC7FB21535FC015DBB0BBA\/60981.png",
"type_id":3,
"sub_type_id":1,
"price_last_changed":"2014-05-10 20:00:13 UTC",
"max_offer_unit_price":0,
"min_sale_unit_price":0,
"offer_availability":0,
"sale_availability":0,
"sale_price_change_last_hour":0,
"offer_price_change_last_hour":0
}
It is JSON data, Use
$yourarray = json_decode(file_get_contents('http://www.gw2spidy.com/api/v0.9/json/all-items/all'),true);
to convert them into an array.

mongodb php $push not inserting array

I've seen a bunch of the other Mongo PHP $push questions up here on SO, but for some reason none of what they're saying is working, so I'm posting my own version.
Basically, I'm trying to follow a guideline set by 10gen where representing something like a news feed or blog/comment post should be done using buckets - making documents that hold a certain number (50) of events (comments, etc.), and then creating multiple documents as the content grows.
What I'm trying to do is push documents ($event) into an array (events), but there seems to be some confusion with PHP when the document doesn't exist (upserting). I tried to do it with an insert, but insert and $push don't play well together.
Here's what I have right now:
$historyDoc = array('_id'=>$uID, 'count'=>1,
array('$push' => array('events' => $event)));
$query = array('_id'=>$uID);
//add user to history
$collection->update($query,$historyDoc,
array('safe'=>true,'timeout'=>5000,'upsert'=>true));
Where $event is a properly-formatted array (document) of things (e.g. timestamp, userID, action, name) and $uID is a MongoID object taken from another collection.
The result that I get is this:
{
"_id": {
"$oid": "4f77ec39fef97a3965000000"
},
"0": {
"$push": {
"events": {
"timestamp": 1333259321,
"action": "achievement",
"pts": 0,
"name": "join"
}
}
},
"count": 1
}
Which is good, because my document is at least showing up right, but how the hell is there a key with a "$" in it? It's not like I'm failing to escape the $...I've been very intently using single quotes for that exact reason.
Maybe there's something I'm missing, or maybe I broke PHP, but I've been wrestling with this one for awhile and there's nothing I can thing of. It's holding my whole project up, so....>:/
Your update document is ill-formed, try this:
$historyDoc = array('_id' => $uID,
'count' => 1,
'$push' => array('events' => $event));
It's not the most elegant solution, but it looks like it works. Apparently there's a problem with using $push on a new document (insert or upsert) (EDIT: It might actually be the issue with combining atomic and non-atomic thing that's the problem. You can't use atomic operators on _id, so...). However, you can get around it by inserting the document first and then updating/upserting it.
In order to initialize an array in Mongo via PHP, you need to create a document with an empty array a a value, as seen here:
$historyDoc = array('_id'=>$uID.'-0',
'count'=>1,
'events'=>array());
From there, you can simply take what you were going to put into the first index and upsert it later:
$collection->update($query, $historyDoc,
array('safe'=>true,'timeout'=>5000,'upsert'=>true));
$collection->update($query,
array('$push'=>array('events'=>$event)),
array('safe'=>true,'timeout'=>5000,'upsert'=>true));
This yields a resulting document of the form:
{
"_id": "4f77f307fef97aed12000000-0",
"count": 1,
"events": [
{
"timestamp": 1333261063,
"action": "achievement",
"pts": 0,
"name": "join"
}
]
}
Source: Mongo PHP Manual - Updates

Assistance with building an inverted-index

It's part of an information retrieval thing I'm doing for school. The plan is to create a hashmap of words using the the first two letters of the word as a key and any words with the two letters saved as a string value. So,
hashmap["ba"] = "bad barley base"
Once I'm done tokenizing a line I take that hashmap, serialize it, and append it to the text file named after the key.
The idea is that if I take my data and spread it over hundreds of files I'll lessen the time it takes to fulfill a search by lessening the density of each file. The problem I am running into is when I'm making 100+ files in each run it happens to choke on creating a few files for whatever reason and so those entries are empty. Is there any way to make this more efficient? Is it worth continuing this, or should I abandon it?
I'd like to mention I'm using PHP. The two languages I know relatively intimately are PHP and Java. I chose PHP because the front end will be very simple to do and I will be able to add features like autocompletion/suggested search without a problem. I also see no benefit in using Java. Any help is appreciated, thanks.
I would use a single file to get and put the serialized string. I would also use json as the serialization.
Put the data
$string = "bad barley base";
$data = explode(" ",$string);
$hashmap["ba"] = $data;
$jsonContent = json_encode($hashmap);
file_put_contents("a-z.txt",$jsonContent);
Get the data
$jsonContent = file_get_contents("a-z.txt");
$hashmap = json_decode($jsonContent);
foreach($hashmap as $firstTwoCharacters => $value) {
if ($firstTwoCharacters == 'ba') {
$wordCount = count($value);
}
}
You didn't explain the problem you are trying to solve. I'm guessing you are trying to make a full text search engine, but you don't have document ids in your hashmap so I'm not sure how you are using the hashmap to find matching documents.
Assuming you want a full text search engine, I would look into using a trie for the data structure. You should be able to fit everything in it without it growing too large. Nodes that match a word you want to index would contain the ids of the documents containing that word.

Categories