Speed: MySQL vs File Output - php

I have a php script that will execute for about 1 hour each time, and during it's runtime, it will need to store a steady stream of comments about what it's doing for me to view later. Basically, each comment includes a timestamp, and a short description, such as "2/25/2010 6:40:29 PM: Updated the price of item 255".
So which is faster, outputting that to a .txt file, or inserting it into a MySQL database? Also, should I use the timestamp from PHP's date(), or should I create a time object in MySQL?
Second part of my question is that since the program is going to run for about an hour, should I connect to MySQL, insert data, and close the connection to the MySQL database each time I log a comment, or should I just connect once, insert data for the runtime of the program, and then close the connection when the program exits, about an hour after creating the initial connection?
Thank you in advance for all your advice.

It depends on your need for the data at the end of the day. Do you need to be able to do Audits on the data outside of scrolling through a file. If you don't need to browse the data or store it in perpetuity, then a flat file will be faster than MySQL, most likely, if you are just appending to the end of a file.
If you need the data to be more useful, you'll want to store it in mysql. I would suggest that you structure your table like:
id int
timestamp datetime default now()
desc varchar
That way you don't have to actually create a timestamp in PHP and just let mysql do the work, then you'll e able to do more complex queries off of your table. But, another consideration you'll want to think about is the volume of the data going into this table, as that will also affect your final decision.

If you're simply logging information for viewing later, writing to file will be quicker. Writing to the database still has to write somewhere, and you get the added overhead of the database engine.

In my experience, it's much faster overall to write the .txt file than to use MySQL to write the log. See, if you write comments into the DB, then you have to write more code to get those comments out of the DB later, instead of just using cat or more or vi or similar to see the comments.
If you choose the DB route: It's perfectly OK to keep a connection open for your hour, but you have to be able to handle "server went away" in case you haven't written to the DB in a while.
-- pete

Related

Speed up insert in Mariadb

Please, if somebody can give me support.
My problem is:
I have a table with 8 fields and about 510 000 records. In a web form, the user select an Excel file and it's read it with SimpleXLSX. The file has about 340 000 lines. With PHP and SimpleXLSX library this file is loaded in memory, then with a for cicle the script read line by line, taken one data of ecah line and search this value in the table, if the data exists in the table, then does not insert the value, other wise, the values read it are stored in the table.
This process takes days to finish.
Can somebody suggest me some operation to speed up the process?
Thanks a lot.
if you have many users, and they maybe use the web at the same time:
you must change SimpleXLSX to js-xlsx, in webbrowser do all work but only write database in server
if you have few users (i think you in this case)
and search this value in the table
this is cost the must time, if your single-to-single compare memory and database, then add/not-add to database.
so you can read all database info in memory, (must use hash-list for compare),then compare all
and add it to memory and mark newable
at last
add memory info to database
because you database and xls have most same count, so...database become almost valueless
just forget database, this is most fast in memory
in memory use hash-list for compare
of course, you can let above run in database if you can use #Barmar's idea.. don't insert single, but batch
Focus on speed on throwing the data into the database. Do not try to do all the work during the INSERT. Then use SQL queries to further clean up the data.
Use the minimal XLS to get the XML into the database. Use some programming language if you need to massage the data a lot. Neither XLS nor SQL is the right place for complex string manipulations.
If practical, use LOAD DATA ... XML to get the data loaded; it is very fast.
SQL is excellent for handling entire tables at once; it is terrible at handling one row at a time. (Hence, my recommendation of putting the data into a staging table, not directly into the target table.)
If you want to discuss further, we need more details about the conversions involved.

What will be better: click counter on mysql or on flat file?

I always was sure it is better and faster to use flat files to store realtime visit/click counter data: open file in append mode, lock it, put data and then close. Then read this file by crontab once in a five minutes, store contents to DB and truncate file for new data.
But today my friend told me, that it is a wrong way. It will better to have a permanent MySql connection and write data right to DB on every click. First, DB can store results to memory table. Second, even we store to a table located on disk, then this file is permanently opened by it, so no need to find it on disk and open again and again on every query.
What do you think about it?
UPD: We talking about high-traffic sites, about million per day.
Your friend is right. Write to a file and then a cronjob sending to database every 5 minutes? That sounds very convoluted. I can't imagine a good reason for not writing directly to DB.
Also, when you write to a file in the way you described, the operations are serialized. A user will have to wait for the other one to release the lock before writing. That simply won't scale if you ever need it. The same will happen with a DB if you always write to the same row, but you can have multiple rows for the same value, write to a random one and sum them when you need the total.
It doesn't make much sense to use a memory table in this case. If your data doesn't need to be persisted, it's much simpler to use a memcache you probably already have somewhere and simply increment the value for the key.
If you use a database WITHOUT transactions, you will get the same underlying performance as using files with more reliability and less coding.
It could be true that writing to a database is heavy - e.g. the DB could be on a different server so you have network traffic, or it could be a transactional DB in which case every write has at least 2 writes (potentially more if indexes are involved), but if you're aware of all this stuff then you can use a DB, take advantage of decades of work by others and make your programming task easy.

A proper way to check which elements from array are not in table

I got a function that downloads lists of links in PHP (lets say about 100 000, but not at once). Would like to download data from those links only if it wasn't downloaded yet, so I need to check which of them are not in MySQL database. Database contains about 40 000 records for now. What is the proper way to do this? I can't keep all those links in array and compare to MySQL results, because it takes too much memory. And I am downloading information from those links multi-threaded (by forks). And if parent takes 10MB of RAM, 30 forks take 300MB, etc. I tried to query database for each link separately, but after short time I am getting disconnected from MySQL server, and when I try to connect again (i ping the connection to check if it is still alive) and try to select database it closes connection with error "MySQL server has gone away". How i supposed to be done?
You can "save" links in text file only for this check, its a lot faster to use this to compare if link is downloaded or not.
Have a look at this mytxt
This is not exactly an answer to your question but it might be worth your while considering saving all of the found results but store them in an associative array with the link as the key. This way duplicates will simply o rewrite previous versions.
The advantages of this approach is that you will not "waste" any time with checking but the disadvantage could be, especially if you are handling many columns, that you need too much time downloading redundant information.

Insert a row every given time else update previous row (Postgresql, PHP)

I have a multiple devices (eleven to be specific) which sends information every second. This information in recieved in a apache server, parsed by a PHP script, stored in the database and finally displayed in a gui.
What I am doing right now is check if a row for teh current day exists, if it doesn't then create a new one, otherwise update it.
The reason I do it like that is because I need to poll the information from the database and display it in a c++ application to make it look sort of real-time; If I was to create a row every time a device would send information, processing and reading the data would take a significant ammount of time as well as system resources (Memory, CPU, etc..) making the displaying of data not quite real-time.
I wrote a report generation tool which takes the information for every day (from 00:00:00 to 23:59:59) and put it in an excel spreadsheet.
My questions are basically:
Is it posible to do the insertion/updating part directly in the database server or do I have to do the logic in the php script?
Is there a better (more efficient) way to store the information without a decrease in performance in the display device?
Regarding the report generation, if I want to sample intervals lets say starting from yesterday at 15:50:00 and ending today at 12:45:00 it cannot be done with my current data structure, so what do I need to consider in order to make a data structure which would allow me to create such queries.
The components I use:
- Apache 2.4.4
- PostgreSQL 9.2.3-2
- PHP 5.4.13
My recommendations - just store all the information, your devices are sending. With proper indexes and queries you can process and retrieve information from DB really fast.
For your questions:
Yes it is possible to build any logic you desire inside Postgres DB using SQL, PL/pgSQL, PL/PHP, PL/Java, PL/Py and many other languages built into Postgres.
As I said before - proper indexing can do magic.
If you cannot get desired query speed with full table - you can create a small table with 1 row for every device. And keep in this table last known values to show them in sort of real-time.
1) The technique is called upsert. In PG 9.1+ it can be done with wCTE (http://www.depesz.com/2011/03/16/waiting-for-9-1-writable-cte/)
2) If you really want it to be real-time you should be sending the data directly to the aplication, storing it in memory or plaintext file also will be faster if you only care about the last few values. But PG does have Listen/notify channels so probabably your lag will be just 100-200 mili and that shouldn't be much taken you're only displaying it.
I think you are overestimating the memory system requirements given the process you have described. Adding a row of data every second (or 11 per second) is not a hog of resources. In fact it is likely more time consuming to UPDATE vs ADD a new row. Also, if you add a TIMESTAMP to your table, sort operations are lightning fast. Just add some garbage collection handling as a CRON job (deletion of old data) once a day or so and you are golden.
However to answer your questions:
Is it posible to do the insertion/updating part directly in the database server or do I >have to do the logic in the php script?
Writing logic from with the Database engine is usually not very straight forward. To keep it simple stick with the logic in the php script. UPDATE (or) INSERT INTO table SET var1='assignment1', var2='assignment2' (WHERE id = 'checkedID')
Is there a better (more efficient) way to store the information without a decrease in >performance in the display device?
It's hard to answer because you haven't described the display device connectivity. There are more efficient ways to do the process however none that have locking mechanisms required for such frequent updating.
Regarding the report generation, if I want to sample intervals lets say starting from >yesterday at 15:50:00 and ending today at 12:45:00 it cannot be done with my current data >structure, so what do I need to consider in order to make a data structure which would >allow me to create such queries.
You could use the a TIMESTAMP variable type. This would include DATE and TIME of the UPDATE operation. Then it's just a simple WHERE clause using DATE functions within the database query.

Retrieving timestamp of MySQL row creation from meta data?

I have been maintaining a MySQL table of rows of data inserted from PHP code for about a year and a half. Really stupid, I know, I didn't include the timestamp for insertion of these rows.
Is there any possible way that I can retrieve the timestamp of creation of these rows through some metadata or some other way? (Of MySQL or PHPMyAdmin or some other possible ways?)
Unfortunately, there's nothing you can do in this case. If MySQL had a secret timestamp field, the general size of tables would increase by 4 bytes per row.
The only way you can get that timestamp, if it is saved somewhere on one of your servers. You have a web server, which you may keep archive of logs for. Or some other place where there is a timestamp of activity of PHP script making requests to the database.
Say you have web server logs and there is an entry for each or most of PHP script activity, then, potentially, you can parse that log, get the timestamp and map it to the rows in your database. As you can see it is quite labourious, but not utterly impossible.
As for MySQL (or any other database) normally they do not keep a big archive of past information. Main reason for that - it is updo developer or designer of the application to decide what information should be kept or not. Database keeps only data needed for all its parts to run healthy.
Just had an idea, that is you have transaction log archive (which I really doubt), then you can re-run them on a back up of a database and may be they (transaction logs) contain timestamp of a row being added or changed.
If you are lucky, you have records in other tables that depend on the record you are interested in. These records may have a timestamp when they were created.
So you have at least a ballpark when the record you care about may have been created.
Other than that, the rate at which the primary key usually grows may provide another estimate when your record was created.
But yes, these are just estimates. Other approaches are mentioned in the other existing answers.
Here is a way to do so :
https://www.marcus-povey.co.uk/2013/03/11/automatic-create-and-modified-timestamps-in-mysql/
I hope this can help

Categories