Open and modify a JSON file using PHP - php

I'm going to be using a JSON file to contain a list of links to a few posts, and it can be updated at any time. However, I'm stumped with the mode to use with PHP's fopen() function. This will be a flat-file database and primarily is for me learning to work with files, PHP, and JSON before moving onto a proper relational database (that, and it's not a huge collection of pages that I'm worried about using SQL or something like that yet...)
The process I'm using is that once a blog post is typed up, it will create a directory, save a new index.php file to it with all of the stuff that lets me view the page, and then, where I'm currently stuck, update a JSON file with the Title, Author, Date, and link to the newly created page.
Based on the PHP Manual, there are three modes I might want to use. r+, w+, or a+.
The process I am looking to use is to take the JSON file and place the data into an array. Update the array, then save it back to the file.
a+ places the pointer at the end of the file and writes are always appended, so I'm assuming this is the worst choice for this situation since I wouldn't add a new JSON entry at the end of the file (I'm tempted to actually insert any new data at the beginning of the JSON object instead of at the end).
w+ mentions read and write, but also truncating the file - does this happen upon saving data to the file, or does this happen the moment the file is opened? If I used this mode on an existing JSON file, would I then be reading a blank file before I can even modify the array and re-save it to the object?
r+ mentions placing the pointer at the beginning of the file - does saving data overwrite what's there or will it insert the data BEFORE what's existing there? If it inserts, how would I manually clear the file and then save the newly-modified array to the JSON object?
Which of those modes are best suited for what I'm looking to do? Is there a better way of doing this, anyway?

If you're always reading or writing an entire file, you don't have to work with file handles at all - PHP provides a pair of functions file_get_contents($file_name) and file_put_contents($file_name, $content) which are much simpler to work with.
File handles with their various modes are most useful when you're working with parts of files. For instance, if you are using CSV, you can read or write one line at a time, without having the full set of data in memory at once. Or, with binary file formats, you might know the location in the file you want to read from, and can "seek" the file handle to that location.

You should probably read the entire file first (eg with file_get_contents(), and then open it with w+ to write the new data. (Edit: Or rather, as the other answer points out, use file_put_contents(), which is always simpler when you are only making one write operation.)
r+ will overwrite as much of the file as you are writing, but won't erase beyond that. If your data always increases in size, this should be the same as overwriting the file entirely, but even if it's true now, that's an assumption that will likely mess up your data in the future.

Related

PHP - parsing contents of excel/pdf file already retrieved and stored in variable, without having to save contents to a file on disk [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Here is the scenario:
I have a variable in php that has raw contents of an excel file and I want to parse the contents of that variable (which is in an excel format, or can also be in a pdf format) for a certain value. I am looking for a keyword near the end of the contents of the file and will need to extract some of the contents near the desired value inside the contents of the file so I can get it into a variable in php and output to my webpage. From what I know the file is in binary, or hex representation but the ascii conversion is represented as readable text with diamond characters (with a question mark) and rectangles with a border and other extraneous characters including readable text content.
Here are the requirements:
I don't want parse the contents of the file by first storing or saving on disk. I want to parse the contents of the retrieved file directly while in a php variable.
Here is my question:
How do I go about this? Should I rely upon PHPExcel to read this content if possible? If not, what php libraries can accomplish this task?
Should I rely upon PHPExcel to read this content if possible?
It is not possible (see below).
If not, what PHP libraries can accomplish this task?
None that I know of.
How do I go about this?
An Excel file (rather, an Excel 2003+ XLSX file - Excel97 XLS files are a wholly different can of worms) is a ZIP archive containing XML and other files in a tree structure. So your first stage is to decompress a ZIP file in a string; PHPExcel relies on the ZipArchive class, and this, in turn, does not support string reading and also bypasses most stream hacks. A similar problem - actually exactly the same problem - is described in this question.
You could think of using stream wrapping to decode the file from a string, and the first part - the reading - would work. The writing of the files would not. And you cannot modify the ZipArchive class so that it writes to a memory object, because it is a native class.
So you can employ a slight variation, from one of the answers above (the one by toster-cx). You need to decode the ZIP structure yourself, and thus get the offset in the ZIP file where the file you need begins. This will either be /xl/worksheets/sheet1.xml or /xl/sharedStrings.xml, depending on whether the string has been inlined by Excel, or not. This also assumes that the format is the newer XLSX. Once you have that, you can extract the data from the string and decompress it, then search it for the token.
Of course, a more efficient use of the time would be to determine exactly why you don't want to use temporary files. Maybe that problem can be solved another way.
Speed problem
Actually, reading/writing an Excel file is not so terrible, because in this case you don't need to do that. You can almost certainly consider it a Zip file, and open it using ZipArchive and getStream() to directly access the internal sub-file you're interested in. This operation will be quite fast, also because you can run the search from the getStream() read cycle. You do need to write the file once, but nothing more.
In fact, chances are that you can write the file while it is being uploaded (what do you use for Web upload? The plupload JS library has a very nice hook to capture very large files one chunk at a time). You still need a temporary area on the disk where to store the data, but in this case the time expenditure will be exclusively dedicated to the decompression and reading of the XML sub-file - the same thing you'd have needed to do with a string object.
It is also (perhaps, depending on several factors, mainly the platform and operating system) possible to offload this part of the work to a secondary process running in the background, so that the user sees the page reload immediately, while the information appears after a while. This part, however, is pretty tricky and can rapidly turn into a maintenance nightmare (yeah, I do have first-hand experience on this. In my case it was tiled image conversion).
Cheating
OK, fact is I love cheating; it's so efficient. You say that you control the XLSX and PDF being created? Well! It turns out that in both cases, you can add hidden metadata to the file. And those metadata are much more easily read than you might think.
For example, you can add zip archive comments to a XLSX file, since it is a Zip file. Actually you could add a fake file with zero length to the archive, call it INVOICE_TOTAL_12345.xml, and that would mean that the invoice total is 12345. The advantage is that the file names are stored in the clear inside the XLSX file, so you can just use preg_match and look for INVOICE_TOTAL_([0-9]+)\.xml and retrieve your total.
Same goes for PDF. You can store keywords in a PDF. Just add a keyword attribute named "InvoiceTotal" (check the PDF to see how that turns out). But there is also a PDF ID inside the PDF, and that ID will be at the very end of the PDF. It will be something like /ID [<ec144ea3ecbb9ab8c22b413fec06fe29><ec144ea3ecbb9ab8c22b413fec06fe29>]^, but just use a known sequence such as deadbeef and ec144ea3ecbb9ab8c22deadbeef12345 will, again, mean the total is 12345. The ID before the known sequence will be random, so the overall ID will still be random and valid.
In both cases you could now just look for a known token in the string, exacly as requested.

Overwriting files properly

I am trying to manage caching on heavily used webpage written in PHP. I have marked some cacheable sections of PHP code, which I want to execute only pre-cache when administrator make changes in CMS. For this, I use this method:
I have file (for example "index-source.php") with some marked ares of PHP code, which are interpretable alone. When admin change some settings, these marked parts are executed and replaced with result (for example MySQL queries which reads menu items from DB are replaced with generated HTML menu). Resulted file is saved as new "index.php", which still have some PHP code, which can't be optimized by caching.
Now to my problem
If we assume, that this server is heavilly load, which means there is for example 100 requests per second, which in PHP requires file index.php. If I will use file_put_contents() to overwrite this index.php with new pre-cached version, is there any risk, that some requests will be interrupted, because of locked/not fully overwritten file? Basically I want to somehow update my PHP file and assure that PHP will include complete old or complete new version of that file or wait few milliseconds until file is overwritten. I dont't want PHP to fail require or load partially overwritten file.
Is that possible? Thanks
file_put_contents is not what you want.
Have a look at this project, and dive into the source to get a feel for what challenges you may have to face as well as the solution chosen.
https://github.com/PHPSocialNetwork/phpfastcache

Best practices to export CSV in PHP: output buffer vs temporary file

Best practices to export CSV in PHP: output buffer vs temporary file
Scenario
I execute a SELECT on a database that returns any number of rows, may be few or many (one million+), those rows need to go inside a .csv file with the first row beeing header.
Doubt
I know two ways of exporting CSV files with PHP: using output buffer php://output or creating a temporary file, serve it to user, than delete it.
Which way is better, knowing it may be a small file or a very big one? Consider PHP memory limit (in php.ini), request time out, etc.
Using the temporary file in case you have large file is the only good option.
you can redirect second request(if file exist) directly to your file and let web server to serve it without executing php.
if client has disconnected, while download a file through api, - in most cases he will start downloading again;
more of that, you will got access logs on your web server, to check who and how many times access this file.
It depends on the situation.
Use an output buffer when you know the file is not ridiculously large and when it is a download that doesn't occur to often.
When you have something large, that will be downloaded a large number of times (simultaneous), writing it to a file might be better to lighten the load on your database and site.
I'd think the answer is pretty obvious: write directly to php://output. It's the same as echo ..; the output will be send to the client more or less directly. It may or may not get buffered for a bit, but unless you have explicit output buffering activated or your web server has a ridiculously large buffer, it'll send it right through. "Sending a file" (presumably via readfile) would pass the data though the same output buffer, but would be much more complicated and error prone.

Creating php file with php code

I wonder if I can create a php file with php code (my project is to write a php based online poker game) and it would be nice if I could add/delete tables from php, but in order to do that I need some code to generate the php file which will be associated with that specific table. (code on every table would be the same, just need something that will allow me to create a .php file for the life time of a table). Also can you tell me how to php-delete it after wards? Thanks ahead.
You can do this simply by having the code written to a file with
file_put_contents($filename, $content).
This will create a new file if it doesn't exist. If it exists it will truncate the file and write the new content.
To read dynamic code you could use eval($codeString). This would interpret the $codeStringas code. NOT RECCOMENDED because if there is ANY user input involved in the $codeString, it would be a huge security risk! (Read the WARNING below!)
To get the contents from a file, you could use $fileContents=file_get_contents($filename)
If you want to write to files with appending text and so on, you need to get deeper in to the php filesystem. A nice place to start is w3 schools: http://www.w3schools.com/php/php_ref_filesystem.asp
You should look at three major functions of writing to files:
fopen();
fwrite();
fclose();
Warning!
Reading dynamic generated code, whether it is from files or just strings, can be really dangerous. Especially if it gets any kind of user- or dynamic input. This is because php-files are capable of editing, creating and deleting ALOT on your server. I would recommend you to find an alternate solution!
You can use file_put_contents Or touch.
file_put_contents() will create the file if not exists and will write provided data in to the file.
touch() will create the file.

Parse large JSON file [duplicate]

This question already has answers here:
Processing large JSON files in PHP
(7 answers)
Closed 9 years ago.
I'm working on a cron script that hits an API, receives JSON file (a large array of objects) and stores it locally. Once that is complete another script needs to parse the downloaded JSON file and insert each object into a MySQL database.
I'm currently using a file_get_contents() along with json_decode(). This will attempt to read the whole file into memory before trying to process it. This would be fine except for the fact that my JSON files will usually range from 250MB-1GB+. I know I can increase my PHP memory limit but that doesn't seem to be the greatest answer in my mind. I'm aware that I can run fopen() and fgets() to read the file in line by line, but I need to read the file in by each json object.
Is there a way to read in the file per object, or is there another similar approach?
try this lib https://github.com/shevron/ext-jsonreader
The existing ext/json which is shipped with PHP is very convenient and
simple to use - but it is inefficient when working with large
ammounts of JSON data, as it requires reading the entire JSON data
into memory (e.g. using file_get_contents()) and then converting it
into a PHP variable at once - for large data sets, this takes up a lot
of memory.
JSONReader is designed for memory efficiency - it works on streams and
can read JSON data from any PHP stream without loading the entire
data into memory. It also allows the developer to extract specific
values from a JSON stream without decoding and loading all data into
memory.
This really depends on what the json files contain.
If opening the file one shot into memory is not an option, your only other option, as you eluded to, is fopen/fgets.
Reading line by line is possible, and if these json objects have a consistent structure, you can easily detect when a json object in a file starts, and ends.
Once you collect a whole object, you insert it into a db, then go on to the next one.
There isn't much more to it. the algorithm to detect the beginning and end of a json object may get complicating depending on your data source, but I hvae done something like this before with a far more complex structure (xml) and it worked fine.
Best possible solution:
Use some sort of delimiter (pagination, timestamp, object ID etc) that allows you to read the data in smaller chunks over multiple requests. This solution assumes that you have some sort of control of how these JSON files are generated. I'm basing my assumption on:
This would be fine except for the fact that my JSON files will usually
range from 250MB-1GB+.
Reading in and processing 1GB of JSON data is simply ridiculous. A better approach is most definitely needed.

Categories