I have a MySQL table with about 9.5K rows, these won't change much but I may slowly add to them.
I have a process where if someone scans a barcode I have to check if that barcode matches a value in this table. What would be the fastest way to accomplish this? I must mention there is no pattern to these values
Here Are Some Thoughts
Ajax call to PHP file to query MySQL table ( my thoughts would this would be slowest )
Load this MySQL table into an array on log in. Then when scanning Ajax call to PHP file to check the array
Load this table into an array on log in. When viewing the scanning page somehow load that array into a JavaScript array and check with JavaScript. (this seems to me to be the fastest because it eliminates Ajax call and MySQL Query. Would it be efficient to split into smaller arrays so I don't lag the server & browser?)
Honestly, I'd never load the entire table for anything. All I'd do is make an AJAX request back to a PHP gateway that then queries the database, and returns the result (or nothing). It can be very fast (as it only depends on the latency) and you can cache that result heavily (via memcached, or something like it).
There's really no reason to ever load the entire array for "validation"...
Much faster to used a well indexed MySQL table, then to look through an array for something.
But in the end it all depends on what you really want to do with the data.
As you mentions your table contain around 9.5K of data. There is no logic to load data on login or scanning page.
Better to index your table and do a ajax call whenever required.
Best of Luck!!
While 9.5 K rows are not that much, the related amount of data would need some time to transfer.
Therefore - and in general - I'd propose to run validation of values on the server side. AJAX is the right technology to do this quite easily.
Loading all 9.5 K rows only to find one specific row, is definitely a waste of resources. Run a SELECT-query for the single value.
Exposing PHP-functionality at the client-side / AJAX
Have a look at the xajax project, which allows to expose whole PHP classes or single methods as AJAX method at the client side. Moreover, xajax helps during the exchange of parameters between client and server.
Indexing to be searched attributes
Please ensure, that the column, which holds the barcode value, is indexed. In case the verification process tends to be slow, look out for MySQL table scans.
Avoiding table scans
To avoid table scans and keep your queries run fast, do use fixed sized fields. E.g. VARCHAR() besides other types makes queries slower, since rows no longer have a fixed size. No fixed-sized tables effectively prevent the database to easily predict the location of the next row of the result set. Therefore, you e.g. CHAR(20) instead of VARCHAR().
Finally: Security!
Don't forget, that any data transferred to the client side may expose sensitive data. While your 9.5 K rows may not get rendered by client's browser, the rows do exist in the generated HTML-page. Using Show source any user would be able to figure out all valid numbers.
Exposing valid barcode values may or may not be a security problem in your project context.
PS: While not related to your question, I'd propose to use PHPexcel for reading or writing spreadsheet data. Beside other solutions, e.g. a PEAR-based framework, PHPExcel depends on nothing.
Related
My stack is php and mysql.
I am trying to design a page to display details of a mutual fund.
Data for a single fund is distributed over 15-20 different tables.
Currently, my front-end is a brute-force php page that queries/joins these tables using 8 different queries for a single scheme. It's messy and poor performing.
I am considering alternatives. Good thing is that the data changes only once a day, so I can do some preprocessing.
An option that I am considering is to create run these queries for every fund (about 2000 funds) and create a complex json object for each of them, store it in mysql indexed for the fund code, retrieve the json at run time and show the data. I am thinking of using the simple json_object() mysql function to create the json, and json_decode in php to get the values for display. Is this a good approach?
I was tempted to store them in a separate MongoDB store - would that be an overkill for this?
Any other suggestion?
Thanks much!
To meet your objective of quick pageviews, your overnight-run approach is very good. You could generate JSON objects with your distilled data, or even prerendered HTML pages, and store them.
You can certainly store JSON objects in MySQL columns. If you don't need the database server to search the objects, simply use TEXT (or LONGTEXT) data types to store them.
To my way of thinking, adding a new type of server (mongodb) to your operations to store a few thousand JSON objects does not seem worth the the trouble. If you find it necessary to search the contents of your JSON objects, another type of server might be useful, however.
Other things to consider:
Optimize your SQL queries. Read up: https://use-the-index-luke.com and other sources of good info. Consider your queries one-by-one starting with the slowest one. Use the EXPLAIN or even the EXPLAIN ANALYZE command to get your MySQL server to tell you how it plans each query. And judiciously add indexes. Using the query-optimization tag here on StackOverflow, you can get help. Many queries can be optimized by adding indexes to MySQL without changing anything in your php code or your data. So this can be an ongoing project rather than a big new software release.
Consider measuring your query times. You can do this with MySQL's slow query log. The point of this is to identify your "dirty dozen" slowest queries in a particular time period. Then, see step one.
Make your pages fill up progressively, to keep your users busy reading while you get the data they need. Put the toplevel stuff (fund name, etc) in server-side HTML so search engines can see it. Use some sort of front-end tech (React, maybe, or Datatables that fetch data via AJAX) to render your pages client-side, and provide REST endpoints on your server to get the data, in JSON format, for each data block in the page.
In your overnight run create a sitemap file along with your JSON data rows. That lets you control exactly how you want search engines to present your data.
I am making a website that interacts with an offline project through json files sent from the offline project to the site.
The site will need to load these files and manipulate the data.
Is it feasible with modern computing power to simply load these files into the database as a single serialized field, which can then be loaded and decoded for every use?
Or would it save significant overhead to properly store the JSON as tables and fields and refer to those for every use?
Without knowing more about the project, a table with multiple fields is probably the better solution.
There will be more options for the data in the long run, for example, indexing fields, searching through fields and many other MySQL commands that would not be possible if it was all stored in a single variable.
Consider future versions of the project too, example adding another field to a table is easy, but adding another field to a block of JSON would be more difficult.
Project growth, what if you experience 100x or 1000x growth will the table handle the extra load.
500kb is a relatively small data block, there shouldn't be any issue with computing power regardless of which method is used, although more information would be handy here, example 500kb per user, per upload, how many stores a day how often is it accessed.
Debugging will also be easier.
The New MySQL shell has a bulk JSON loader that is not only very quick but lets you have a lot of control on how the data is handled. See https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-json.html
Load it as JSON.
Think about what queries you need to perform.
Copy selected fields into MySQL columns so that those queries can use WHERE, GROUP BY, ORDER BY of MySQL instead of having to deal with the processing in the client.
A database table contains a bunch of similarly structured rows. Each row has a constant set of columns. (NULLs can be used to indicate missing columns for a given row.) JSON complicates things by providing a complex column. My advice above is a compromise between the open-ended flexibility of JSON and the need to use the database server to process lots of data. Further discussion here.
I have a program that creates logs and these logs are used to calculate balances, trends, etc for each individual client. Currently, I store everything in separate MYSQL tables. I link all the logs to a specific client by joining the two tables. When I access a client, it pulls all the logs from the log_table and generates a report. The report varies depending on what filters are in place, mostly date and category specific.
My concern is the performance of my program as we accumulate more logs and clients. My intuition tells me to store the log information in the user_table in the form of a serialized array so only one query is used for the entire session. I can then take that log array and filter it using PHP where as before, it was filtered in a MYSQL query (using multiple methods, such as BETWEEN for dates and other comparisons).
My question is, do you think performance would be improved if I used serialized arrays to store the logs as opposed to using a MYSQL table to store each individual log? We are estimating about 500-1000 logs per client, with around 50000 clients (and growing).
It sounds like you don't understand what makes databases powerful. It's not about "storing data", it's about "storing data in a way that can be indexed, optimized, and filtered". You don't store serialized arrays, because the database can't do anything with that. All it sees is a single string without any structure that it can meaningfully work with. Using it that way voids the entire reason to even use a database.
Instead, figure out the schema for your array data, and then insert your data properly, with one field per dedicated table column so that you can actually use the database as a database, allowing it to optimize its storage, retrieval, and database algebra (selecting, joining and filtering).
Is serialized arrays in a db faster than native PHP? No, of course not. You've forced the database to act as a flat file with the extra dbms overhead.
Is using the database properly faster than native PHP? Usually, yes, by a lot.
Plus, and this part is important, it means that your database can live "anywhere", including on a faster machine next to your webserver, so that your database can return results in 0.1s, rather than PHP jacking 100% cpu to filter your data and preventing users of your website from getting page results because you blocked all the threads. In fact, for that very reason it makes absolutely no sense to keep this task in PHP, even if you're bad at implementing your schema and queries, forget to cache results and do subsequent searches inside of those cached results, forget to index the tables on columns for extremely fast retrieval, etc, etc.
PHP is not for doing all the heavy lifting. It should ask other things for the data it needs, and act as the glue between "a request comes in", "response base data is obtained" and "response is sent back to the client". It should start up, make the calls, generate the result, and die as fast as it can again.
It really depends on how you need to use the data. You might want to look into storing with mongo if you don't need to search that data. If you do, leave it in individual rows and create your indexes in a way that makes them look up fast.
If you have 10 billion rows, and need to look up 100 of them to do a calculation, it should still be fast if you have your indexes done right.
Now if you have 10 billion rows and you want to do a sum on 10,000 of them, it would probably be more efficient to save that total somewhere. Whenever a new row is added, removed or updated that would affect that total, you can change that total as well. Consider a bank, where all items in the ledger are stored in a table, but the balance is stored on the user account and is not calculated based on all the transactions every time the user wants to check his balance.
I have table in database named ads, this table contains data about each ad.
I want to get that data from table to display ad.
Now, I have two choices:
Either get all data from table and store it in array, and then , I will treat with this array to display each ad in its position by using loops.
Or access to table directly and get each ad data to display it, note this way will consume more queries to database.
Which one is the best way, and not make the script more slow ?
In most Cases #1 is better.
Because, if you can select the data (smallest, needed set) in one query,
then you have less roundtrips to the database server.
Accessing Array or Objectproperties (from Memory) are usually faster than DB Queries.
You could also consider to prepare your Data and don't mix fetching with view output.
The second Option "select on demand" could make sense if you need to "lazy load",
maybe because you can or want to recognize client properties, like viewport.
I'd like to highlight the following part:
get all data from table and store it in array
You do not need to store all rows into an array. You could also take an iterator that represents the resultset and then use that one.
Depending on the database object you use this is often the less memory-intensive variant. Also you would run only one query here which is preferable.
The iterator is actually common with modern database result objects.
Additionally this is helpful to decouple the view code from the actual database interaction and you can also defer to do the SQL query.
You should minimize the amount of queries but you should also try to minimize the amount of data you actually get from the database.
So: Get only those ads that you are actually displaying. You could for example use columnPK IN (1, 2, 3, 4) to get those ads.
A notable exception: If your application is centered around "ads" and you need them pretty much everywhere, and/or they don't consume much memory, and/or there aren't too many adds, it might be better performance-wise to store all (or a subset) of your ads in an array.
Above all: Measure, measure, measure!
It is very, very hard to predict which algorithm will be most efficient. Often you implement something "because it will be more efficient" only to find out later that your optimization is actually slowing down your application.
You should always try to run a PHP script with the least amount of database queries possible. Whenever you query the database, a request must be sent to the database (usually) over the network, and your script will idle until the request came back.
You should, however, make sure not to request any more data from the database than necessary. So try to filter as much in the WHERE clause as possible instead of requesting the whole table and then picking individual rows on the PHP layer.
We could help with writing that SQL query when you tell us how your table looks and how you want to select which ads to display.
I have a table with just 3,000 records.
I render these 3000 records in the home page without pagination, my client is not interested in pagination...
So to show page completely it takes around 1 min, 15 sec. What can be done to make the page load more quickly?
My table structure:
customer table
customer id
customer name
guider id
and few columns
guider table
guider id
guider name
and few columns
Where's the slow down? The query or the serving?
If the former, see the comments above. If the latter:
Enable gzip on the server. Otherwise capture the [HTML?] output to a file, compress it (zip), then serve it as a download. Same for any other format if you think something else can render it better than a browser (CSV and Open Office).
If you're outputting the data into a HTML table then you may have an issue where the browser is waiting for the end of the table before rendering it. You can either break this into multiple table chunks like every 500 records/rows or try CSS "table-layout: fixed;".
Check the Todos
sql Connection (dont open the
connection in loop) for query it
should be one time connection
check your queries and analyse it if you are using some complex logic
which can be replaced
use standard class for sql connection and query ; use ezsql
sql query best practice
While you could implement a cache to do this, you don't necessarily need to do so, an introducing unnecessary cache structures can often cause problems of its own. Depending on where the bottleneck is, it may not even help you much, or at all.
You need to look in two places for your analysis:
1) The query you're using to get your data. Take a look at its plan, or if you're not comfortable doing that, run it in your favorite query tool and see how long it takes to come back. If it doesn't take too long, you've got a pretty good idea that your bottleneck isn't the query. If the query itself takes a long time, that's where you should focus your efforts.
2) How your page is rendering. What is the size of your page, in bytes? It may be too big. Can you cut the size down by formatting? Can you more effectively use CSS to eliminate duplicate styling on the page? Are you using a fixed or dynamic table layout? Dynamic is generally going to be quite a bit slower, especially for large tables. Try to avoid nesting tables. Do everything you can to make the page as small as possible, and keep testing!
while displaying records i want to
display guidername so , i did once
function that return the guider name
Sounds like you need to use a JOIN. Here's a simple example:
SELECT * FROM customer JOIN guider ON guider.id=customer.guider_id
This will change your page from using N + 1 (3001) queries to just one.
Make sure both guider.id and customer.guider_id are indexed and of appropriate data types (such as integers).
This is a little list, what you should think about for improving the performance, the importance is relative to each point, so the first ist not to be the most important to you - which depends on the details of your project.
Check your database structure. If there are just these two tables, their might be little you can do. But keep in mind that there is stuff like indices and with an increasing number of records a second denormalizes table structure will improve the speed of retrieving results.
Use rather one Query for selecting your data, than iterating through ids and doing selects repeatedly
Run a separate Query for the guiders, I assume there are only a few of them. Save all guiders in a data structure, e.g. a dictionary, first and use the foreign key to apply the correct one to the current record - this might save a lot of data which has to be transmitted from the database to your web server.
Get your result set by using something like mysqli_result::fetch_all() which returns a 2-dimensional array with all results. This should be faster than iteration through each row with fetch_row()
Sanitize your HTML Output, use (external) CSS. This will save a lot of output space if you format your stuff with style=" ... a lot of formatting code ..." attributes in each line. If you use one large table, split them up in multiple tables (some browsers wait for the complete table to load before rendering it).
In a lot of languages very important: Use a string builder for concatenating your results into the output string!
Caching: Think about generating the output once a day or once an hour. Write it to a cachefile which is opened instead of querying the database and building the same stuff on every request. Maybe you want to offer this generated file as download, rather than displaying it as plain HTML Site on the web.
Last but not least, check the connections to webserver and database, the server load as well as the number of requests. If your servers are running on heavy load everything ales here might help reducing the load or you just have to upgrade hardware.
LOL
everyone is talking of big boys toys, like database structure, caching and stuff.
While the problem most likely lays in mere HTML and browsers.
Just to split whole HTML table in chunks will help first chunk to show up immediately while others will eventually come.
Only ones were right who said to profile whole thing first.
Trying to answer without profiling results is shooting in the dark.