I have built a script to scrape some records and store them as a json object for later use. This is the last step in the scraping process (Area->Location->ReportMeta->ReportDetails) that are all working fine with this method of storing the data.
The issue is there are a lot of them, on the order of several hundred thousand. I tried accumulating them all into an array then encoding and writing that to a file, but it's maxing out memory before it gets close to being finished. I could up the memory but I am looking for a more stable/replicatable/"out of the box" way of doing this. Best practice if you will.
My first thought was just write them to the file as each one is scraped. That is working but I am left with a single file with many individual json objects which is nigh unreadable unless I do some special formatting to bring it back in.
I am looking for a better way of doing this, or some advice.
$reports_obj = new Report();
foreach($reports_array as $report){
$report_details = $reports_obj->getReport($report['report_id'], $report['report_type']);
$fp = fopen('report_details.json', 'a');
fwrite($fp, json_encode($report_details));
fclose($fp);
}
This leaves me with a whole bunch of this:
{
"report_id": "12345",
"report_type": "Type A",
"facility_name": "Name here",
"facility_type": "building",
"report_date": "26-February-2018"
}
{
"report_id": "12345",
"report_type": "Type A",
"facility_name": "Name here",
"facility_type": "building",
"report_date": "26-February-2018"
}
{
"report_id": "12345",
"report_type": "Type A",
"facility_name": "Name here",
"facility_type": "building",
"report_date": "26-February-2018"
}
Would it be best to try and find/replace the large file after the fact with proper json structure, or is there a better way of storing this? I can't open the file, re-read the data and then array push, for instance, as this would ultimately have the same limitation as just accumulating them all into an array to begin with.
As for "why" json? It's just a soft preference. I would like to stay with it if possible.
Maybe , you can try this :
$reports_obj = new Report();
foreach($reports_array as $report){
$report_details[] = $reports_obj->getReport($report['report_id'],$report['report_type']);
}
$jsonjson=json_encode($report_details);
$report="{\"report\":".$jsonjson."}";
$fp = fopen('report_details.json', 'a');
fwrite($fp,$report);
fclose($fp);
if you have , sample, maybe i can check ?
You should look for a NoSQL database.
If you dont want/cant for what ever reason it is better to Loop through all reports, generate the JSON and write after instead of opening and closing a file each time
$result="";
foreach($reports_array as $report){
$report_details = $reports_obj->getReport($report['report_id'], $report['report_type']);
$result .= json_encode($report_details)."\n\r";
}
$fp = fopen('report_details.json', 'a');
fwrite($fp,$result);
fclose($fp)
Related
I already have a PHP script to upload a CSV file: it's a collection of tweets associated to a Twitter account (aka a brand). BTW, Thanks T.A.G.S :)
I also have a script to parse this CSV file: I need to extract emojis, hashtags, links, retweets, mentions, and many more details I need to compute for each tweet (it's for my research project: digital affectiveness. I've already stored 280k tweets, with 170k emojis inside).
Then each tweet and its metrics are saved in a database (table TWEETS), as well as emojis (table EMOJIS), as well as account stats (table BRANDS).
I use a class quite similar to this one: CsvImporter > https://gist.github.com/Tazeg/b1db2c634651c574e0f8. I made a loop to parse each line 1 by 1.
$importer = new CsvImporter($uploadfile,true);
while($content = $importer->get(1)) {
$pack = $content[0];
$data = array();
foreach($pack as $key=>$value) {
$data[]= $value;
}
$id_str = $data[0];
$from_user = $data[1];
...
After all my computs, I "INSERT INTO TWEETS VALUES(...)", same with EMOJIS. The after, I have to make some other operations
update reach for each id_str, if a tweet I saved is a reply to a previous tweet)
save stats to table BRAND
All these operations are scripted in a single file, insert.php, and triggered when I submit my upload form.
But everything falls down if there is too many tweets. My server cannot handle so long operations.
So I wonder if I can ajaxify parts of the process, especially the loop
upload the file
parse 1 CSV line and save it in SQL and display a 'OK' message each time a tweet is saved
compute all other things (reach and brand stats)
I'm not enough aware of $.ajax() but I guess there is something to do with beforeSend, success, complete and all the Ajax Events. Or maybe am I completely wrong!?
Is there anybody who can help me?
As far as I can tell, you can lighten the load on your server substantially because $pack is an array of values already, and there is no need to do the key value loop.
You can also write the mapping of values from the CSV row more idiomatically. Unless you know the CSV file is likely to be huge, you should also do multiple lines
$importer = new CsvImporter($uploadfile, true);
// get as many lines as possible at once...
while ($content = $importer->get()) {
// this loop works whether you get 1 row or many...
foreach ($content as $pack) {
list($id_str, $from_user, ...) = $pack;
// rest of your line processing and SQL inserts here....
}
}
You could also go on from this and insert multiple lines into your database in a single INSERT statement, which is supported by most SQL databases.
$f = fopen($filepath, "r");
while (($line = fgetcsv($f, 10000, ",")) !== false) {
array_push($entries, $line);
}
fclose($f);
try this, it may help.
because a provider I use, has a quite unreliable MySQL servers, which are down at leas 1 time pr week :-/ impacting one of the sites I made, I want to prevent its outeges in the following way:
dump the MySQL table to a file In case the connection with the SQL
server is failed,
then read the file instead of the Server, till the Server is back.
This will avoid outages from the user experience point of view.
In fact things are not so easy like it seems and I ask for your help please.
What I did is to save the data to a JSON file format.
But this got issues because many data on the DB are "in clear" included escaped complex URLs, with long argument's line, that give some issue during the decode process from JSON.
On CSV and TSV is also not workign correctly.
CSV is delimited by Commas or Semilcolon , and those are present in the original content taken from the DB.
TSV format leave double quotes that are not deletable, without avoid to go to eliminate them into the record's fields
Then I tried to serialize each record read from the DB, store it and retrive it serializing it.
But the result is a bit catastrophic, becase all the records are stored in the file.
When I retrieve them, only one is returned. then there is something that blocks the functioning of the program (here below the code please)
require_once('variables.php');
require_once("database.php");
$file = "database.dmp";
$myfile = fopen($file, "w") or die("Unable to open file!");
$sql = mysql_query("SELECT * FROM song ORDER BY ID ASC");
// output data of each row
while ($row = mysql_fetch_assoc($sql)) {
// store the record into the file
fwrite($myfile, serialize($row));
}
fclose($myfile);
mysql_close();
// Retrieving section
$myfile = fopen($file, "r") or die("Unable to open file!");
// Till the file is not ended, continue to check it
while ( !feof($myfile) ) {
$record = fgets($myfile); // get the record
$row = unserialize($record); // unserialize it
print_r($row); // show if the variable has something on it
}
fclose($myfile);
I tried also to uuencode and also with base64_encode but they were worse choices.
Is there any way to achieve my goal?
Thank you very much in advance for your help
If you have your data layer well decoupled you can consider using SQLite as a fallback storage.
It's just a matter of adding one abstraction more, with the same code accessing the storage and changing the storage target in case of unavailability of the primary one.
-----EDIT-----
You could also try to think about some caching (json/html file?!) strategy returning stale data in case of mysql outage.
-----EDIT 2-----
If it's not too much effort, please consider playing with PDO, I'm quite sure you'll never look back and believe me this will help you structuring your db calls with little pain when switching between storages.
Please take the following only as an example, there are much better
way to design this architectural part of code.
Just a small and basic code to demonstrate you what I mean:
class StoragePersister
{
private $driver = 'mysql';
public function setDriver($driver)
{
$this->driver = $driver;
}
public function persist($data)
{
switch ($this->driver)
{
case 'mysql':
$this->persistToMysql($data);
case 'sqlite':
$this->persistToSqlite($data);
}
}
public function persistToMysql($data)
{
//query to mysql
}
public function persistSqlite($data)
{
//query to Sqlite
}
}
$storage = new StoragePersister;
$storage->setDriver('sqlite'); //eventually to switch to sqlite
$storage->persist($somedata); // this will use the strategy to call the function based on the storage driver you've selected.
-----EDIT 3-----
please give a look at the "strategy" design pattern section, I guess it can help to better understand what I mean.
After SELECT... you need to create a correct structure for inserting data, then you can serialize or what you want.
For example:
You have a row, you could do that - $sqls[] = "INSERT INTOsong(field1,field2,.. fieldN) VALUES(field1_value, field2_value, ... fieldN_value);";
Than you could serialize this $sqls, write into file, and when you need it, you could read, unserialize and make query.
Have you thought about caching your queries into a cache like APC ? Also, you may want to use mysqli or pdo instead of mysql (Mysql is deprecated in the latest versions of PHP).
To answer your question, this is one way of doing it.
var_export will export the variable as valid PHP code
require will put the content of the array into the $data variable (because of the return statement)
Here is the code :
$sql = mysql_query("SELECT * FROM song ORDER BY ID ASC");
$content = array();
// output data of each row
while ($row = mysql_fetch_assoc($sql)) {
// store the record into the file
$content[$row['ID']] = $row;
}
mysql_close();
$data = '<?php return ' . var_export($content, true) . ';';
file_put_contents($file, $data);
// Retrieving section
$rows = require $file;
Foreward: beginner PHP here.
I have a large table I have created on a page that is dynamically created from user defined settings. It is about 1600 rows and 15 columns. I have also populated an array with this data where Key 0 = All the values of row 1 separated by commas. Key 1 = all the values of row 2 separated by commas.
The Array was populated from a loop as the table was created and has the name/structure:
$CSVOut[$CSVKey]
I have confirmed the array is populated and displaying properly on the first page.
I have a button that calls 'textexport.php' with the following code:
<?php
header("Content-type: text/csv");
header("Content-Disposition: attachment; filename=file.csv");
header("Pragma: no-cache");
header("Expires: 0");
outputCSV(array(
array("name 1", "age 1", "city 1"),
array("name 2", "age 2", "city 2"),
array("name 3", "age 3", "city 3")
));
function outputCSV($data) {
$output = fopen("php://output", "w");
foreach ($data as $row) {
fputcsv($output, $row); // here you can change delimiter/enclosure
}
fclose($output);
}
?>
The sample array in TestExport.php accurately exports a CSV of the hardcoded array. I have searched and tried multiple ways for multiple days on how to get the array $CSVOut from my first page to this page and use the function to populate my CSV but it is not coming together for me.
I've tried POST methods (I've been able to successfully POST other user variables on this page so I kind of understand how it works) and tried SESSIONS (but don't fully grasp this concept yet)
I have a couple questions:
1) How bad of a method is this to try and export a dynamically created html table as a CSV?
2) What would be the best (easiest?) method?
3) Anybody have some guidance/example on how to do this?
4) Is it possible to do it in reverse - just call this function on the first page with the populated array?
For everything else in this project I've been able to find existing examples to adapt for my project but I haven't been able to really get this one right. Any help would be much appreciated.
If you don't want to (or don't have time to) bother with back end data storage and user sessions the best solution may simply be to use JavaScript/jQuery to slice and dice the dynamic output that the user has created into a CSV formatted string that the user can copy and paste out. Simply grabbing the table that they have made and processing each row should suffice...using jQuery will make it easier.
var rows = $('#example tbody').find('tr');
var output_string = '';
rows.each(function() {
var column_data = jQuery.map( $(this).find('td'), function(datum) {
return $(datum).text().replace(/\s/gi,'');
});
output_string += column_data.join(',')+'\n';
});
$('#output').val(output_string);
The above code will grab the body portion of an HTML table (ignoring any table headings you have defined) and turn each table row into a line of comma-separated values terminated with a newline.
It then dumps the created output into a textarea.
JSFiddle Example
This is not as fancy as temporarily saving the content to Mongo/MySQL and serving it back to the user with the click of a button - but it can be done quickly and without having to sort out data storage, sessions, authentication, etc. However those are fairly common parts of a full web stack so I suggest you take some time to familiarize yourself with them at some point. Here are some useful links.
Wamp or XAMPP are full service PHP,MySQL and Apache stacks which are great for getting up and running with Development without wasting a lot of time fiddling with server configs.
And here is a basic explanation/tutorial of Sessions in PHP. Also consider a little light reading, there are many security pitfalls that starting PHP Developers can fall into and this book covers the most avoidable ones.
I have a one-to-one key structure of zip codes to user located in a .html that looks like...
...
'80135': 'user1',
'80136': 'user1',
'80137': '',
'80138': '',
'80202': 'user2',
'80203': 'user2',
'80204': '',
'80205': '',
'80206': '',
'80207': '',
...
I would like to take a bulk list zip codes for user3 and fill in or over-write old user. So for example if I have for user3, zip codes (80202,80203,80204) then my previous block of code would change to...
...
'80135': 'user1',
'80136': 'user1',
'80137': '',
'80138': '',
'80202': 'user3',
'80203': 'user3',
'80204': 'user3',
'80205': '',
'80206': '',
'80207': '',
...
The reason for text editor is to complete my set now, but ideally it would be nice to have a client application that our non-programmer team can update and make changes as they please, so a script for this would be nice for future plans.
I am passing the content into my site via...
var list_load = {<?php include_once('list.html'); ?>};
Because I believe some might have an alternate idea on storage of this information, this list is very long, 35,000 lines of code, so any ideas on completely changing my code consider a process to migrate data.
Based on the information you provided I'm going to assume your example code is a subset of a JSON object. I'm also going to assume that you only have a one-to-one relationship of zip-codes to users in your object given that you did not explicitly state otherwise and given that the existing object structure would not allow for one-to-many relationship. Given all these assumptions the solution to your immediate problem is to load the JSON into PHP, make your changes there, and then overwrite the entire file with the updated JSON object.
$json = file_get_contents('list.html');
$array = json_decode($json, true);
$oldUser = 'user2';
$newUser = 'user3';
$listZipCodes = array("80204"); // list of zip codes to update
// Update the JSON based on user ...
foreach ($array as $zipCode => $user) {
if ($user === $oldUser) {
$array[$zipCode] = $newUser;
}
}
// Update the JSON based on zip code ...
foreach (array_keys($array) as $zipCode) {
if (in_array($zipCode, $listZipCodes)) {
$array[$zipCode] = $newUser;
}
}
$json = json_encode($array);
file_put_contents('list.html', $json);
Again, this is all assuming that list.html looks like this...
{
"80135": "user1",
"80136": "user1",
"80137": "",
"80138": "",
"80202": "user2",
"80203": "user2",
"80204": "",
"80205": "",
"80206": "",
"80207": ""
}
Remember it has to be valid a valid JSON object notation in your list.html in order for PHP to be able to parse it correctly. Your existing example is not going to work in PHP because you're using single quotes instead of double quotes, which is a requirement of the JSON spec. So you have to make sure that part is valid in your list.html file.
Beyond that I highly discourage you to take this approach, because it causes a number of serious problems that can't easily be solved. For example, you can not ensure your data won't be corrupted using this approach as anyone two PHP scripts may attempt to overwrite the file at the same time (no data integrity). Also, you can't easily make this scale without it costing you a lot of unnecessary CPU and memory problems if the list gets large enough. Additionally, you have no way to control who may edit the file directly and thus no way to control data flow to the underlying application code that tries to use that data.
A better solution is to use a database and that way you can both control the data and its user privileges, plus you can provide a front-end for non-programmers to edit/modify/read the data through your front-end, via PHP.
I am currently using a PHP script to create a cache of JSON files. We have a PHP file on our server that queries a very large database, and returns result sets in JSON format (so we can use it on the site in jQuery and other frameworks).
The script for presenting the raw JSON from our database works great, and I made a simple cache building script that also works to a degree. However, we have noticed some odd things happening with the resulting cache file.
PHP seems to be adding slashes to the quote marks, as well as adding superfluous " to the beginning and end of the JSON.
Here is a sample of the JSON we're passing in (note that it's not complete):
[
{
"id":1580,
"name":"Fydell House",
"address1":"South Square",
"address2":null,
"towncity":"Boston",
"county":"Lincolnshire",
"postcode":"PE21 6HU",
"addressVerbose":"South Square
Boston
Lincolnshire
PE21 6HU
",
"addressVerboseLinked":"",
"longitude":-0.022778,
"latitude":52.975806,
"londonBorough":null,
"telno":"01205 351 520",
"faxno":null,
"email":null,
"webaddress":null,
"mainimg":null,
"venueType":"Historic Building",
"description":null,
"excerpt":" ",
"images":null,
"creationDate":943920000,
"londonfeatured":false,
"unusual":false,
"featured":false,
"active":true,
"trial":false,
"modifiedDate":1234709308,
"hits":"1579",
"allowReviews":false,
"amenities":null,
"imagealt":"Lincolnshire wedding reception venue in Boston, Fydell House",
"imagetitle":"Lincolnshire wedding venues in Boston",
"car_directions":null,
"pub_directions":null,
"additional_info":null,
"listedBy":0,
"listedByName":null,
"region":null
}
]
And the PHP code that outputs the JSON file and stores it on our server:
// Function to output the contents from the live data, and create a cache file:
function create_cache_file($url, $filename)
{
$url = str_replace(' ', '%20', $url);
$json_string = file_get_contents($url);
$file = fopen($filename, 'w');
// If there is a problem creating the file:
if(!$file)
{
die('error creating the file!');
}
else
{
fwrite($file, json_encode($json_string));
fclose($file);
echo $json_string;
}
}
And this is what the file looks like after it's been processed with PHP and stored on our server:
"[{\"id\":437,\"name\":\"Lanteglos Country House Hotel\",\"address1\":\"Lanteglos-by-Camelford\",\"address2\":null,\"towncity\":\"Camelford\",\"county\":\"Cornwall\",\"postcode\":\"PL32 9RF\",\"addressVerbose\":\"Lanteglos-by-Camelford<br \\\/>Camelford<br \\\/>Cornwall<br \\\/>PL32 9RF<br \\\/>\",\"addressVerboseLinked\":\"\",\"longitude\":-4.695491,\"latitude\":50.612462,\"londonBorough\":null,\"telno\":\"01840 213 551\",\"faxno\":null,\"email\":null,\"webaddress\":null,\"mainimg\":null,\"venueType\":\"Hotel\",\"description\":null,\"excerpt\":\" \",\"images\":null,\"creationDate\":943920000,\"londonfeatured\":false,\"unusual\":false,\"featured\":false,\"active\":true,\"trial\":false,\"modifiedDate\":1234662248,\"hits\":\"1145\",\"allowReviews\":false,\"amenities\":null,\"imagealt\":\"Cornwall wedding reception venue in Camelford, Lanteglos Country House Hotel\",\"imagetitle\":\"Cornwall wedding venues in Camelford\",\"car_directions\":null,\"pub_directions\":null,\"additional_info\":null,\"listedBy\":0,\"listedByName\":null,\"region\":null},{\"id\":438,\"name\":\"Rosehill Public House\",\"address1\":\"Little Petherick\",\"address2\":null,\"towncity\":\"Wadebridge\",\"county\":\"Cornwall\",\"postcode\":\"PL27 7QT\",\"addressVerbose\":\"Little Petherick<br \\\/>Wadebridge<br \\\/>Cornwall<br \\\/>PL27 7QT<br \\\/>\",\"addressVerboseLinked\":\"\",\"longitude\":-4.94093,\"latitude\":50.51259,\"londonBorough\":null,\"telno\":\"01841 540 777\",\"faxno\":null,\"email\":null,\"webaddress\":null,\"mainimg\":null,\"venueType\":\"Inn \\\/ Pub\",\"description\":null,\"excerpt\":\" \",\"images\":null,\"creationDate\":943920000,\"londonfeatured\":false,\"unusual\":false,\"featured\":false,\"active\":true,\"trial\":false,\"modifiedDate\":1234752874,\"hits\":\"818\",\"allowReviews\":false,\"amenities\":null,\"imagealt\":\"Cornwall wedding reception venue in Wadebridge, Rosehill Public House\",\"imagetitle\":\"Cornwall wedding venues in Wadebridge\",\"car_directions\":null,\"pub_directions\":null,\"additional_info\":null,\"listedBy\":0,\"listedByName\":null,\"region\":null}]"
This is causing some serious problems elsewhere on the site when we try to decode the JSON. While we can use stripslashes to remove the slashes, some of the other areas are also causing parse errors (that weren't present using the raw data served up by the first PHP file).
Can anyone suggest why PHP is adding slashes and erroneous quote marks around the string? Ideally, we would like a work-around at the point of creating the JSON file on the server, not reading it from it...
I take it that file_get_contents($url) returns you a JSON encoded file? Then your problem is that you're JSON encoding it again. "[{\"id\":437,\"nam... is the proper JSON encoded representation of the string [{"id":1580,"nam....
If you json_decoded it when reading from the cache file, you'd get the original JSON string back. Or just don't json_encode it.