I am importing a csv file with more then 5,000 records in it. What i am currently doing is, getting all file content as an array and saving them to the database one by one. But in case of script failure, the whole process will run again and if i start checking the them again one by one form database it will use lots of queries, so i thought to keep the imported values in session temporarily.
Is it good practice to keep that much of records in the session. Or is there any other way to do this ?
Thank you.
If you have to do this task in stages (and there's a couple of suggestions here to improve the way you do things in a single pass), don't hold the csv file in $_SESSION... that's pointless overhead, because you already have the csv file on disk anyway, and it's just adding a lot of serialization/unserialization overhead to the process as the session data is written.
You're processing the CSV records one at a time, so keep a count of how many you've successfully processed in $_SESSION. If the script times out or barfs, then restart and read how many you've already processed so you know where in the file to restart.
What can be the maximum size for the $_SESSION ?
The session is loaded into memory at run time - so it's limited by the memory_limit in php.ini
Is it good practice to keep that much of records in the session
No - for the reasons you describe - it will also have a big impact on performance.
Or is there any other way to do this ?
It depends what you are trying to achieve. Most databases can import CSV files directly or come with tools which will do it faster and more efficently than PHP code.
C.
It's not a good idea imho since session data will be serialized/unserialized for every page request, even if they are unrelated to the action you are performing.
I suggest using the following solution:
Keep the CSV file lying around somewhere
begin a transaction
run the inserts
commit after all inserts are done
end of transaction
Link: MySQL Transaction Syntax
If something fails the inserts will be rolled back so you know you can safely redo the inserts without having to worry about duplicate data.
To answer the actual question (Somebody just asked a duplicate, but deleted it in favour of this question)
The default session data handler stores its data in temporary files. In theory, those files can be as large as the file system allows.
However, as #symcbean points out, session data is auto-loaded into the script's memory when the session is initialized. This limits the maximum size you should store in session data severely. Also, loading lots of data has a massive impact on performance.
If you have huge amounts of data you need to store connected to a session, I would recommend using temporary files that you name by the current session ID. You can then deal with those files as needed, and as possible within the limits of the script's memory_limit.
If you are using Postgresql, you can use a single query to insert them all using pg_copy_from., or you can use pg_put_line like it is shown in the example (copy from stdin), which I found very useful when importing tons of data.
If you use MySql, you'll have to do multiple inserts. Remember to use transactions, so that if you use transactions, if your query fails it will be canceled and you can start over. Note that 5000 rows is not that large! You snould however be aware of the max_execution_time constraint which will kill your script after a number of seconds.
For what the SESSION is concerned, I believe that you are limited by the maximum amount of memory a script can use (memory_limit in php.ini). Session data are saved in files, so you should consider also the disk space usage if many clients are connected.
It depends on operating system file size, Whatever the session size, per page default is 128 MB.
Related
I need to hold a semi-static large object in cache so I don't need to request it every time from database. Something like $_SESSION, but not tied to a session, because the data are common to all users.
I can cache client side that data, once I got it, but I would like to avoid disturbing the database with select queries of large data that (almost) never changes.
Also, I cannot add modules (like APC cache) in this environment.
I could store my data into a file, say a JSON, which I read with php instead of querying db, but accessing filesystem is also disturbing if php needs to do it many times per seconds AND filesize is not tiny.
Is there a built in way in php to store objects in memory, common to all php instances?
EDIT: Could I use $_session as storing space, forcing session_id to be always the same? Is it dangerous? I don't use sessions for the application itself. I tried and it works
Most Operating systems will store the result of reading from disk in its cache.
This means that the disk will not be hit each time. File based storage is actually pretty quick for multiple reads of the same file as its really just coming direct from memory.
as long as "pretty large" still means fits in memory this way should be fine
I always was sure it is better and faster to use flat files to store realtime visit/click counter data: open file in append mode, lock it, put data and then close. Then read this file by crontab once in a five minutes, store contents to DB and truncate file for new data.
But today my friend told me, that it is a wrong way. It will better to have a permanent MySql connection and write data right to DB on every click. First, DB can store results to memory table. Second, even we store to a table located on disk, then this file is permanently opened by it, so no need to find it on disk and open again and again on every query.
What do you think about it?
UPD: We talking about high-traffic sites, about million per day.
Your friend is right. Write to a file and then a cronjob sending to database every 5 minutes? That sounds very convoluted. I can't imagine a good reason for not writing directly to DB.
Also, when you write to a file in the way you described, the operations are serialized. A user will have to wait for the other one to release the lock before writing. That simply won't scale if you ever need it. The same will happen with a DB if you always write to the same row, but you can have multiple rows for the same value, write to a random one and sum them when you need the total.
It doesn't make much sense to use a memory table in this case. If your data doesn't need to be persisted, it's much simpler to use a memcache you probably already have somewhere and simply increment the value for the key.
If you use a database WITHOUT transactions, you will get the same underlying performance as using files with more reliability and less coding.
It could be true that writing to a database is heavy - e.g. the DB could be on a different server so you have network traffic, or it could be a transactional DB in which case every write has at least 2 writes (potentially more if indexes are involved), but if you're aware of all this stuff then you can use a DB, take advantage of decades of work by others and make your programming task easy.
My web application lets user import an excel file and writes the data from the file into the mysql database.
The problem is, when the excel file has lots of entries, even 1000 rows, i get an error saying PHP ran out of memory. This occurs while reading the file.
I have assigned 1024MB to PHP in the php.ini file.
My question is, how to go about importing such large data in PHP.
I am using CodeIgniter.
for reading the excel file, i am using this library.
SOLVED. I used CSV instead of xls. and I could import 10,000 rows of data within seconds.
Thank you all for your help.
As others have said, 1000 records is not much. Make sure you process the records one at a time, or a few at a time, and that the variables you use for each iteration go out of scope after you're finished with that row or you're reusing the variables.
If you can avoid the necessity of processing excel files by exporting them to csv, that's even greater, cause then you wouldn't need such a library (which might or might not have its own memory issues).
Don't be afraid of increasing memory usage if you need to and that solves the problem, buying memory is the cheapest option sometimes. And don't let the 1 GB scare you, it is a lot for such a simple task, but if you have the memory and that's all you need to do, then its good enough for the moment.
And as a plus, if you are using an old version of PHP, try updating to PHP 5.4 which handles memory much better than its predecessors.
Instead of inserting one a time in a loop. Insert 100 row at a time.
You can always run
INSERT INTO myTable (clo1, col2, col2) VALUES
(val1, val2), (val3, val4), (val5, val6) ......
This way number of network transaction will reduce thus reducing resource usage.
I'm designing my own session handler for my web app, the PHP sessions are too limited when trying to control the time the session should last.
Anyway, my first tests were like this: a session_id stored on a mysql row and also on a cookie, on the same mysql row the rest of my session vars.
On every request to the server I make a query, get these vars an put them on an array to use the necesary ones on runtime.
Last night I was thinking if I could write the vars on a server file once, on the login stage, and later just include that file instead of making a mysql query on every request.
So, my question is: which is less resource consuming? doing this on mysql or on a file?
I know, I know, I already read several threads on stackoverflow about this issue, but I have something different from all those cases (I hope I didn't miss something):
I need to keep track of the time that has passed since the last time the user used the app, so, in every call to the server not only I request the entire database row, I also update a timestamp on that same row.
So, on both cases I need to write to the session on every request...
FYI: the entire app runs on one server so the several servers scenario when using files does not apply..
It's easier to work with when it's done in a database and I've been using sessions in database mostly for scalability.
You may use MySQL since it can store sessions in it's temporary memory with well-configured MySQL servers, you can even use memory tables to fasten the thing if you can store all the sessions within memory. If you get near your memory limit it's easy to switch to a normal table.
I'd say MySQL wins over files for performance for medium to large sites and also for customization/options. For smaller websites I think that it doesn't make that much of a difference, but you will use more of the hard drive when using files.
I'm using the session array to cache chunks of information retrieved from the db:
$result = mysql_query('select * from table');
array_push($_SESSION['data'],new Data(mysql_fetch_assoc($result)));
My question is, is there a limit/a sizeable amount of information that can/should be passed around in a session? Is it ill advised or significantly performance hindering to do this?
By default, $_SESSION data is stored on disk in the /tmp directory of your server. As long as you have enough room in there AND you aren't hitting your PHP memory limit, you're fine.
However, if you're attempting to cache a query that is the SAME for a larger number of users, you might want to use something like APC or memcache that isn't tied to the individual user. Otherwise, your essentially going to cache the same result 1x for each user, and not leveraging a cache across all users.
I think the answer would depend on where you are storing your data and how fast you can transfer it there.
If the data is 44 MB big, and you are on a 1000base-T network, you can expect it to take 1 second to actually transfer THERE. And 1 second to transfer back..
If you use local memory, then you have a finite amount of memory the machine.
If you use disk, then you have load/save times (disk is slow).
But also keep in mind, PHP has a finite amount of memory it allows a script to use. I think the default setting is 8 MB.
If you are talking about large blocks of data, you may want to consider Redis, Tokyo Cabinet or other key/value stores. Or even a backend interface to manipulate the data/cache it for you without transferring it through PHP.
Because Session data is stored in a file (or database record) on your server, it shouldn't matter too much how much data you store in it. I would just advise against huge objects.
You might want to look at APC or memcached to cache the results instead, as it is not a per-user cache, and it uses the memory instead of files.
The session is serialized and written to disk by default, so depending on the size and the amount of users things can become slow. However both things can be changed (read the session manual under http://php.net/session for all details) like using memcache for in-memory storage of the data. Best thing is to try it out under an environment as similar as possible tothe live system and check the resulting load and throughput.
Mmm, tricky. I think you could save it in the session. The real question is: do you want that all that information serialize and unserialize every time a client make a request?
I think it would be OK to save it in there if you will use all that information in every page of your website, but this is unprobable. It would be better if you save that information in a directory like /temptables/sometable/ and each file have the name of the session. You can use session_id to get it, and save and load the information in the pages you have to use with:
$info = unserialize(file_get_contents('/templatebles/sometable/'.session_id().'.ser'));
and saving with:
file_put_contents('/temptables/sometable/'.session_id().'.ser'), serialize($info));
But you need a cron job to clean that directory for old file. You can do it getting the session from the filename and ask for some variable, like 'itsalive', using session_start() or doing something like file_exists(session_save_path().'/sess_'.$session_name) to check if you should delete the temporary file.