How to speed up Flat File Database in PHP? - php

I noticed that ffdb in PHP are awfully slow. So how would I go about speeding up the ffdb without SQL?

Flat file will become more slow the more data it's contain. This is because in order to get a data, the engine need to get all data first into memory, then select the requested data.
SQL database have random seek feature, because the data placement is in a known location, plus it have index table. That's why any SQL database is not affected with the number of data stored in it.
You can try sqlite if you need single file database. Firefox and Thunderbird use sqlite to store bookmark and browsing history.

Tokyo Cabinet is amazingly fast for key/value lookups. There is also the search addition (tyrant) that allows for more complex queries
http://1978th.net/tokyocabinet/
PHP implementation
http://code.google.com/p/phptyrant/

Related

PHP Array efficiency vs mySQL query

I have a MySQL table with about 9.5K rows, these won't change much but I may slowly add to them.
I have a process where if someone scans a barcode I have to check if that barcode matches a value in this table. What would be the fastest way to accomplish this? I must mention there is no pattern to these values
Here Are Some Thoughts
Ajax call to PHP file to query MySQL table ( my thoughts would this would be slowest )
Load this MySQL table into an array on log in. Then when scanning Ajax call to PHP file to check the array
Load this table into an array on log in. When viewing the scanning page somehow load that array into a JavaScript array and check with JavaScript. (this seems to me to be the fastest because it eliminates Ajax call and MySQL Query. Would it be efficient to split into smaller arrays so I don't lag the server & browser?)
Honestly, I'd never load the entire table for anything. All I'd do is make an AJAX request back to a PHP gateway that then queries the database, and returns the result (or nothing). It can be very fast (as it only depends on the latency) and you can cache that result heavily (via memcached, or something like it).
There's really no reason to ever load the entire array for "validation"...
Much faster to used a well indexed MySQL table, then to look through an array for something.
But in the end it all depends on what you really want to do with the data.
As you mentions your table contain around 9.5K of data. There is no logic to load data on login or scanning page.
Better to index your table and do a ajax call whenever required.
Best of Luck!!
While 9.5 K rows are not that much, the related amount of data would need some time to transfer.
Therefore - and in general - I'd propose to run validation of values on the server side. AJAX is the right technology to do this quite easily.
Loading all 9.5 K rows only to find one specific row, is definitely a waste of resources. Run a SELECT-query for the single value.
Exposing PHP-functionality at the client-side / AJAX
Have a look at the xajax project, which allows to expose whole PHP classes or single methods as AJAX method at the client side. Moreover, xajax helps during the exchange of parameters between client and server.
Indexing to be searched attributes
Please ensure, that the column, which holds the barcode value, is indexed. In case the verification process tends to be slow, look out for MySQL table scans.
Avoiding table scans
To avoid table scans and keep your queries run fast, do use fixed sized fields. E.g. VARCHAR() besides other types makes queries slower, since rows no longer have a fixed size. No fixed-sized tables effectively prevent the database to easily predict the location of the next row of the result set. Therefore, you e.g. CHAR(20) instead of VARCHAR().
Finally: Security!
Don't forget, that any data transferred to the client side may expose sensitive data. While your 9.5 K rows may not get rendered by client's browser, the rows do exist in the generated HTML-page. Using Show source any user would be able to figure out all valid numbers.
Exposing valid barcode values may or may not be a security problem in your project context.
PS: While not related to your question, I'd propose to use PHPexcel for reading or writing spreadsheet data. Beside other solutions, e.g. a PEAR-based framework, PHPExcel depends on nothing.

Is there a way of keeping database data in PHP while server is running?

I'm making a website that (essentially) lets the user submit a word, matches it against a MySQL database, and returns the closest match found. My current implementation is that whenever the user submits a word, the PHP script is called, it reads the database information, scans each word one-by-one until a match is found, and returns it.
I feel like this is very inefficient. I'm about to make a program that stores the list of words in a tree structure for much more effective searching. If there are tens of thousands of words in the database, I can see the current implementation slowing down quite a bit.
My question is this: instead of having to write another, separate program, and use PHP to just connect to it with every query, can I instead save an entire data tree in memory with just PHP? That way, any session, any query would just read from memory instead of re-reading the database and rebuilding the tree over and over.
I'd look into running an instance of memcached on your server. http://www.memcached.org.
You should be able to store the compiled tree of data in memory there and retrieve it for use in PHP. You'll have to load it into PHP to perform your search, though, as well as architect a way for the tree in memcached to be updated when the database changes (assuming the word list can be updated, since there's not a good reason to store it in a database otherwise).
Might I suggest looking at the memory table type in mysql: http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html
You can then still use mysql's searching features on fast "in memory" data.
PHP really isn't a good language for large memory structures. It's just not very memory efficient and it has a persistence problem, as you are asking about. Typically with PHP, people would store the data in some external persistent data store that is optimized for quick retrieval.
Usually people use a two fold approach:
1) Store data in the database as optimized as possible for standard queries
2) Cache results of expensive queries in memcached
If you are dealing with a lot of data that cannot be indexed easily by relational databases, then you'd probably need to roll your own daemon (e.g., written in C) that kept a persistent copy of the data structure in memory for fast querying capabilities.

Mysql Query vs XML Data in php

We are building a website which would serve about 30k unique visitors a day.
Currently we use a simple mysql Connect > A Simple Query > mysql Close.
I'm afraid that with a dual core server running 2GB of RAM we would be able
to open about 1k mysql connection tops. is 1k a good estimate?
Is it better to make a Cron-Job output XML files and let our php files grab the data from them?
Typically XML will never be faster than MySQL for searching data (i.e. performing queries).
I don't know what kind of data you have, but XML will only be faster if you have a bunch of simple files and don't need to search, just load the files and format them.
If you need to search, then use MySQL.
MySQL does all sorts of optimizations. For example it stores KEY columns in a separate file, allowing for a much faster search.
I would suggest using Zend cache for caching MySQL query results for the the data that doesn't change frequently.

Multiple MySQL queries and how to make my script go faster

Using PHP (1900 secs time limit and more than 1GB memory limit) and MySQL (using PEAR::MDB2) on this one...
I am trying to create a search engine that will load data from site feeds in a mysql database. Some sites have rather big feeds with lots of data in them (for example more than 80.000 records in just one file). Some data checking for each of the records is done prior to inserting the record in the database (data checking that might also insert or update a mysql table).
My problem is as many of you might have already understood...time! For each record in the feed there are more than 20 checks and for a feed with eg: 10.000 records there might be >50.000 inserts to the database.
I tried to do this with 2 ways:
Read the feed and store the data in an array and then loop through the array and do the data checking and inserts. (This proves to be the fastest of all)
Read the feed and do the data checking line by line and insert.
The database uses indexes on each field that is constantly queried. The PHP code is tweaked with no extra variables and the SQL queries are simple select, update and insert statements.
Setting time limits higher and memory is nor a problem. The problem is that I want this operation to be faster.
So my question is:
How can i make the process of importing the feed's data faster? Are there any other tips that I might not be aware of?
Using LOAD DATA INFILE is often many times faster than using INSERT to do a bulk load.
Even if you have to do your checks in PHP code, dump it to a CSV file and then use LOAD DATA INFILE, this can be a big win.
If your import is a one time thing, and you use a fulltext index, a simple tweak to speed the import up is to remove the index, import all your data and add the fulltext index once the import is done. This is much faster, according to the docs :
For large data sets, it is much faster
to load your data into a table that
has no FULLTEXT index and then create
the index after that, than to load
data into a table that has an existing
FULLTEXT index.
You might take a look at php's PDO extension, and it's support for preapeared statements. You could also consider to use stored procedures in mysql.
2) You could take a look at other database systems, as CouchDB and others, and sacrifice consistency for performance.
I managed to double the inserted data with INSERT DELAYED command in 1800 sec. The 'LOAD DATA INFILE' suggestion was not the case since the data should be strongly validated and it would messup my code.
Thanks for all your answers and suggestions :)

CSV vs MySQL performance

Lets assume the same environments for PHP5 working with MySQL5 and CSV files. MySQL is on the same host as hosted scripts.
Will MySQL always be faster than retriving/searching/changing/adding/deleting records to CSV?
Or is there some amount of data below which PHP+CSV performance is better than using database server?
CSV won't let you create indexes for fast searching.
If you always need all data from a single table (like for application settings), CSV is faster, otherwise not.
I don't even consider SQL queries, transactions, data manipulation or concurrent access here, as CSV is certainly not for these things.
No, MySQL will probably be slower for inserting (appending to a CSV is very fast) and table-scan (non-index based) searches.
Updating or deleting from a CSV is nontrivial - I leave that as an exercise for the reader.
If you use a CSV, you need to be really careful to handle multiple threads / processes correctly, otherwise you'll get bad data or corrupt your file.
However, there are other advantages too. Care to work out how you do ALTER TABLE on a CSV?
Using a CSV is a very bad idea if you ever need UPDATEs, DELETEs, ALTER TABLE or to access the file from more than one process at once.
As a person coming from the data industry, I've dealt with exactly this situation.
Generally speaking, MySQL will be faster.
However, you don't state the type of application that you are developing. Are you developing a data warehouse application that is mainly used for searching and retrieval of records? How many fields are typically present in your records? How many records are typically present in your data files? Do these files have any relational properties to each other, i.e. do you have a file of customers and a file of customer orders? How much time do you have to develop a system?
The answer will depend on the answer to the questions listed previously. However, you can generally use the following as a guidelines:
If you are building a data warehouse application with records exceeding one million, you may want to consider ditching both and moving to a Column Oriented Database.
CSV will probably be faster for smaller data sets. However, rolling your own insert routines in CSV could be painful and you lose the advantages of database indexing.
My general recommendation would be to just use MySql, as I said previously, in most cases it will be faster.
From a pure performance standpoint, it completely depends on the operation you're doing, as #MarkR says. Appending to a flat file is very fast. As is reading in the entire file (for a non-indexed search or other purposes).
The only way to know for sure what will work better for your use cases on your platform is to do actual profiling. I can guarantee you that doing a full table scan on a million row database will be slower than grep on a million line CSV file. But that's probably not a realistic example of your usage. The "breakpoints" will vary wildly depending on your particular mix of retrieve, indexed search, non-indexed search, update, append.
To me, this isn't a performance issue. Your data sounds record-oriented, and MySQL is vastly superior (in general terms) for dealing with that kind of data. If your use cases are even a little bit complicated by the time your data gets large, dealing with a 100k line CSV file is going to be horrific compared to a 100k record db table, even if the performance is marginally better (which is by no means guaranteed).
Depends on the use. For example for configuration or language files CSV might do better.
Anyway, if you're using PHP5, you have 3rd option -- SQLite, which comes embedded in PHP. It gives you ease of use like regular files, but robustness of RDBMS.
Databases are for storing and retrieving data. If you need anything more than plain line/entry addition or bulk listing, why not go for the database way? Otherwise you'd basically have to code the functionality (incl. deletion, sorting etc) yourself.
CSV is an incredibly brittle format and requires your app to do all the formatting and calcuations. If you need to update a spesific record in a csv you will have to first read the entire csv file, find the entry in memory would need to change, then write the whole file out again. This gets very slow very quickly. CSV is only useful for write once, readd once type apps.
If you want to import swiftly like a thief in the night, use SQL format.
If you are working in production server, CSV is slow but it is the safest.
Just make sure the CSV file doesn't have a Primary Key which will override your existing data.

Categories