What's faster: MySQL LEFT(*,100) or PHP substr()?

What's faster: MySQL LEFT(*,100) or PHP substr()? - php

I am building a simple list of the last 10 updated pages from the database. Each record I need to display: name and shortened/truncated description that is stored as TEXT. Some pages the description can be over 10,000 characters.
Which is better for speed and performance? Or a better way to go about this? I use both Zend and Smarty.
MySQL
SELECT id, name, LEFT(description, 100) FROM pages ORDER BY page_modified DESC LIMIT 10;
PHP
function ShortenText($text) {
// Change to the number of characters you want to display
$chars = 100;
$text = $text." ";
$text = substr($text,0,$chars);
$text = substr($text,0,strrpos($text,' '));
$text = $text."...";
return $text;
}

Because your question was specifically "faster" not "better" i can say for sure that performing the calculation in the DB is actually faster. "Better" is a much different question, and depending on the use case, #Graydot's suggestion might be better in some cases.
The notion of having the application server marshal data when it doesn't need to is inconsistent with the idea of specialization. Databases are specialized in retrieving data and performing massive calculations on data; that's what they do best. Application servers are meant to orchestrate the flow between persistence, business logic and user interface.
Would you use sum() in a SQL statement or would you retrieve all the values into your app server, then loop and add them up? ABSOLUTELY, performing the sum in the DB is faster... keep in mind the application server is actually a client to the database. If you pull back all that data to the application server for crunching, you are sending bytes of data across the network (or even just across RAM segments) that don't need to be moved... and that all flows via database drivers so there are lots of little code thingies touching and moving the data along.
BUT there is also the question of "Better" which is problem specific...If you have requirements about needing the row level data, or client side filtering and re-summing (or letting the user specify how many left charatcers they want to see in the result set), then it might make sense to do it in the app server so you dont have to keep going back to the database.
you asked specifically "faster" and the answer is "database" - but "overall faster" might mean something else and "overall better" entirely something else. as usual, truth is fuzzy and the answer to just about everything is "It depends"
hth
Jon

LEFT in the database.
Less data sent back to the client (far less in this case, a max of 1k vs 100k text)
It's trivial compared to the actual table access, ORDER BY etc
It also doesn't break any rules such as "format in the client": it's simply common sense
Edit: looks we have a religious war brewing.
If the question asked for complex string manipulation or formatting or non-aggregate calculations then I'd say use php. This is none of these cases.
One thing you can't optimise is the network compared to db+client code.

I agree with gbn, but if you're looking to integrate the ... suffix, you can try:
SELECT id,
name,
CASE WHEN LENGTH(description)>25 THEN
CONCAT(LEFT(description, 25),'...')
ELSE
description
END CASE AS short_description
FROM pages
ORDER BY page_modified DESC
LIMIT 10;
Where 25 is the number of characters the preview text should have. (Note this won't split in to whole words, but neither does your PHP function).

My POV (which may be wrong!) is that PHP is used to parse the stuff from the server, send it to the DB, and then present it to the client. I prefer to use stored procedures in the database - because it is easy to know what queries are going to be executed and to ensure that the business logic is adhered to.
I just think that having these definite lines is a good idea.
Forgot to mention - The database knows more about the structure and nature of the data than a PHP script.

General Rule-of-Thumb:
Keep substring functions out of the WHERE clause because of the scalar nature of having to compare several columns in WHERE clauses.
Use substring functions on columns because there is a significant bottleneck between the database server and the database client.

Related

Handling data with 1000~ variables, preferably using SQL

Basically, I have tons of files with some data. each differ, some lack some variables(null) etc, classic stuff.
The part it gets somewhat interesting is that, since each file can have up to 1000 variables, and has at least 800~ values that is not null, I thought: "Hey I need 1000 columns". Another thing to mention is, they are integers, bools, text, everything. they differ by size, and type. Each variable is under 100 bytes, at all files, alth. they vary.
I found this question Work around SQL Server maximum columns limit 1024 and 8kb record size
Im unfamiliar with capacities of sql servers and table design, but the thing is: people who answered that question say that they should reconsider the design, but I cant do that. I however, can convert what I already have, as long as I still have that 1000 variables.
Im willing to use any sql server, but I dont know what suits my requirements best. If doing something else is better, please tell so.
What I need to do with this data is, look, compare, and search within. I dont need the ability to modify these. I thought of just using them as they are and keeping them as plain text files and reading from, that requires "seconds" of php runtime for viewing data out of "few" of these files and that is too much. Not even considering the fact that I need to check about 1000 or more of these files to do any search.
So the question is, what is the fastest way of having 1000++ entities with 1000 variables each, and searching/comparing for any variable I wish within them, etc. ? and if its SQL, which SQL server functions best for this sort of stuff?

Sounds like you need a different kind of database for what you're doing. Consider a document database, such as MongoDB, or one of the other not-only-SQL database flavors that allows for manipulation of data in different ways than a traditional table structure.
I just saw the note mentioning that you're only reading as well. I've had good luck with Solr on a similar dataset.

You want to use an EAV model. This is pretty common

You are asking for best, I can give an answer (how I solved it), but cant say if it is the 'best' way (in your environment), I had the Problem to collect inventory data of many thousend PCs (no not NSA - kidding)
my soultion was:
One table per PC (File for you?)
Table File:
one row per file, PK FILE_ID
Table File_data
one row per column in file, PK FILE_ID, ATTR_ID, ATTR_NAME, ATTR_VALUE, (ATTR_TYPE)
The Table File_data, was - somehow - big (>1e6 lines) but the DB handled that fast
HTH
EDIT:
I was pretty short in my anwser, lately; I want to put some additional information to my (and still working) solution:
the table 'per info source' has more than the two fields PK, FILE_ID ie. ISOURCE, ITYPE, where ISOURCE and ITYPE dscribe from where (I had many sources) and what basic Information type it is / was. This helps to get a structure into queries. I did not need to include data from 'switches' or 'monitors', when searching for USB divices (edit: to day probably: yes)
the attributes table had more fields, too. I mention here the both fileds: ISOURCE, ITYPE, yes, the same as above, but a slightly different meaning, the same idea behind
What you would have to put into these fields, depends definitely on your data.
I am sure, that if you take a closer look, what information you have to collect, you will find some 'KEY Values' for that

For storage, XML is probably the best way to go. There is really good support for XML in SQL.
For queries, if they are direct SQL queries, 1000+ rows isn't a lot and XML will be plenty fast. If you're moving towards a million+ rows, you're probably going to want to take the data that is most selective out of the XML and index that separately.
Link: http://technet.microsoft.com/en-us/library/hh403385.aspx

Which is faster / more efficient - lots of little MySQL queries or one big PHP array?

I have a PHP/MySQL based web application that has internationalization support by way of a MySQL table called language_strings with the string_id, lang_id and lang_text fields.
I call the following function when I need to display a string in the selected language:
public function get_lang_string($string_id, $lang_id)
{
$db = new Database();
$sql = sprintf('SELECT lang_string FROM language_strings WHERE lang_id IN (1, %s) AND string_id=%s ORDER BY lang_id DESC LIMIT 1', $db->escape($lang_id, 'int'), $db->escape($string_id, 'int'));
$row = $db->query_first($sql);
return $row['lang_string'];
}
This works perfectly but I am concerned that there could be a lot of database queries going on. e.g. the main menu has 5 link texts, all of which call this function.
Would it be faster to load the entire language_strings table results for the selected lang_id into a PHP array and then call that from the function? Potentially that would be a huge array with much of it redundant but clearly it would be one database query per page load instead of lots.
Can anyone suggest another more efficient way of doing this?

There isn't an answer that isn't case sensitive. You can really look at it on a case by case statement. Having said that, the majority of the time, it will be quicker to get all the data in one query, pop it into an array or object and refer to it from there.
The caveat is whether you can pull all your data that you need in one query as quickly as running the five individual ones. That is where the performance of the query itself comes into play.
Sometimes a query that contains a subquery or two will actually be less time efficient than running a few queries individually.
My suggestion is to test it out. Get a query together that gets all the data you need, see how long it takes to execute. Time each of the other five queries and see how long they take combined. If it is almost identical, stick the output into an array and that will be more efficient due to not having to make frequent connections to the database itself.
If however, your combined query takes longer to return data (it might cause a full table scan instead of using indexes for example) then stick to individual ones.
Lastly, if you are going to use the same data over and over - an array or object will win hands down every single time as accessing it will be much faster than getting it from a database.

OK - I did some benchmarking and was surprised to find that putting things into an array rather than using individual queries was, on average, 10-15% SLOWER.
I think the reason for this was because, even if I filtered out the "uncommon" elements, inevitably there was always going to be unused elements as a matter of course.
With the individual queries I am only ever getting out what I need and as the queries are so simple I think I am best sticking with that method.
This works for me, of course in other situations where the individual queries are more complex, I think the method of storing common data in an array would turn out to be more efficient.

Agree with what everybody says here.. it's all about the numbers.
Some additional tips:
Try to create a single memory array which holds the minimum you require. This means removing most of the obvious redundancies.
There are standard approaches for these issues in performance critical environments, like using memcached with mysql. It's a bit overkill, but this basically lets you allocate some external memory and cache your queries there. Since you choose how much memory you want to allocate, you can plan it according to how much memory your system has.
Just play with the numbers. Try using separate queries (which is the simplest approach) and stress your PHP script (like calling it hundreds of times from the command-line). Measure how much time this takes and see how big the performance loss actually is.. Speaking from my personal experience, I usually cache everything in memory and then one day when the data gets too big, I run out of memory. Then I split everything to separate queries to save memory, and see that the performance impact wasn't that bad in the first place :)

I'm with Fluffeh on this: look into other options at your disposal (joins, subqueries, make sure your indexes reflect the relativity of the data -but don't over index and test). Most likely you'll end up with an array at some point, so here's a little performance tip, contrary to what you might expect, stuff like
$all = $stmt->fetchAll(PDO::FETCH_ASSOC);
is less memory efficient compared too:
$all = array();//or $all = []; in php 5.4
while($row = $stmt->fetch(PDO::FETCH_ASSOC);
{
$all[] = $row['lang_string '];
}
What's more: you can check for redundant data while fetching the data.

My answer is to do something in between. Retrieve all strings for a lang_id that are shorter than a certain length (say, 100 characters). Shorter text strings are more likely to be used in multiple places than longer ones. Cache the entries in a static associative array in get_lang_string(). If an item isn't found, then retrieve it through a query.

I am currently at the point in my site/application where I have had to put the brakes on and think very carefully about speed. I think these speed tests mentioned should consider the volume of traffic on your server as an important variable that will effect the results. If you are putting data into javascript data structures and processing it on the client machine, the processing time should be more regular. If you are requesting lots of data through mysql via php (for example) this is putting demand on one machine/server rather than spreading it. As your traffic grows you are having to share server resources with many users and I am thinking that this is where getting JavaScript to do more is going to lighten the load on the server. You can also store data in the local machine via localstorage.setItem(); / localstorage.getItem(); (most browsers have about 5mb of space per domain). If you have data in database that does not change that often then you can store it to client and then just check at 'start-up' if its still in date/valid.
This is my first comment posted after having and using the account for 1 year so I might need to fine tune my rambling - just voicing what im thinking through at present.

Speed of calculations in SQL statement

I've got a database (MySQL) table with three fields : id, score, and percent.
Long story short, I need to do a calculation on each record that looks like this:
(Score * 10) / (1 - percent) = Value
And then I need to use that value both in my code and as the ORDER BY field. Writing the SQL isn't my issue - I'm just worried about the efficiency of this statement. Is doing that calculation in my SQL statement the most efficient use of resources, or would I be better off grabbing the data and then doing math via PHP?
If SQL is the best way to do it, are there any tips I can keep in mind for keeping my SQL pulls as speedy as possible?
Update 1: Just to clear some things up, because it seems like many of the answers are assuming differently : Both the Score and the Percent will be changing constantly. Actually, just about every time a user interacts with the app, those fields will change (those fields are actually linked to a user, btw).
As far as # of records, right now it's very small, but I would like to be scaling for a target set of about 2 million records (users). At any given time I will only need 20ish records, but I need them to be the top 20 records sorted by this calculated value.

It sounds like this calculated value is of inherent meaning in your business domain; if this is the case, I would calculate it once (e.g. at the time the record is created), and use it just like any normal field. This is by far the most efficient way to achieve what you want - the extra calculation on insert or update has minimal performance impact, and from then on you don't have to worry about who does the calculation where.
Drawback is that you do have to update your "insert" and "update" logic to perform this calculation. I don't usually like triggers - they can be the source of impenetrable bugs - but this is a case where I'd consider them (http://dev.mysql.com/doc/refman/5.0/en/triggers.html).
If for some reason you can't do that, I'd suggest doing it on the database server. This should be pretty snappy, unless you are dealing with very large numbers of records; in that case the "order by" will be a real performance problem. It will be a far bigger performance problem if you execute the same logic on the PHP side, of course - but your database tends to be the bottleneck from a performance point of view, so the impact is larger.
If you're dealing with large numbers of records, you may just have to bite the bullet and go with my first suggestion.
If it weren't for the need to sort by the calculation, you could also do this on the PHP side; however, sorting an array in PHP is not something I'd want to do for large result sets, and it seems wasteful not to do sorting in the database (which is good at that kinda thing).
So, after all that, my actual advice boils down to:
do the simplest thing that could work
test whether it's fast enough within the constraints of your
project
if not, iteratively refactor to a faster solution, re-test
once you reach "good enough", move on.
Based on edit 1:
You've answered your own question, I think - returning (eventually) 2 million rows to PHP, only to find the top 20 records (after calculating their "value" one by one) will be incredibly slow. So calculating in PHP is really not an option.
So, you're going to be calculating it on the server. My recommendation would be to create a view (http://dev.mysql.com/doc/refman/5.0/en/create-view.html) which has the SQL to perform the calculation; benchmark the performance of the view with 200, 200K and 2M records, and see if it's quick enough.
If it isn't quick enough at 2M users/records, you can always create a regular table, with an index on your "value" column, and relatively little needs to change in your client code; you could populate the new table through triggers, and the client code might never know what happened.

doing the math in the database will be more efficient because sending the data back and forth from the database to the client will be slower than that simple expression no matter how fast the client is and how slow the database is.

Test it out and let us know the performance results. I think it is going to depend on the volume of data in your result set. For the SQL bit, just make sure your where clause has a covered index.

Where you do the math shouldn't be too important. It's the same fundamental operation either way. Now, if MySQL is running on a different server than your PHP code, then you may care which CPU does the calculation. You may wish that the SQL server does more of the "hard work", or you may wish to leave the SQL server doing "only SQL", and move the math logic to PHP.
Another consideration might be bandwidth usage (if MySQL isn't running on the same machine as PHP)--you may wish to have MySQL return whichever form is shorter, to use less network bandwidth.
If they're both on the same physical hardware, though, it probably makes no noticeable difference, from a sheer CPU usage standpoint.
One tip I would offer is to do the ORDER BY on the raw value (percent) rather than on the calculated value--this way MySQL can use an index on the percent column--it can't use indexes on calculated values.

If you have a growing number of records, your script (and its memory) will reach its limits faster than mysql would. Are you planning to fetch all records anyway?
Mysql would be quicker in general.
I don't get how you would use the value calculated in php in an ORDER BY afterwards. If you are planning to sort in php, it would become even slower but it all depends on the number of records you're dealing with.

search big database

I have a database which holds URL's in a table (along with other many details about the URL). I have another table which stores strings that I'm going to use to perform searches on each and every link. My database will be big, I'm expecting at least 5 million entries in the links table.
The application which communicates with the user is written in PHP. I need some suggestions about how I can search over all the links with all the patterns (n X m searches) and in the same time not to cause a high load on the server and also not to lose speed. I want it to operate at high speed and low resources. If you have any hints, suggestions in pseudo-code, they are all welcomed.
Right now I don't know whether to use SQL commands to perform these searches and have some help from PHP also or completely do it in PHP.

First I'd suggest that you rethink the layout. It seems a little unnecessary to run this query for every user, try instead to create a result table, in which you just insert the results from that query that runs ones and everytime the patterns change.
Otherwise, make sure you have indexes (full text) set on the fields you need. For the query itself you could join the tables:
SELECT
yourFieldsHere
FROM
theUrlTable AS tu
JOIN
thePatternTable AS tp ON tu.link LIKE CONCAT('%', tp.pattern, '%');

I would say that you pretty definately want to do that in the SQL code, not the PHP code. Also searching on the strings of the URLs is going to be a long operation so perhaps some form of hashing would be good. I have seen someone use a variant of a Zobrist hash for this before (google will bring a load of results back).
Hope this helps,
Dan.

Do as much searching as you practically can within the database. If you're ending up with an n x m result set, and start with at least 5 million hits, that's a LOT Of data to be repeatedly slurping across the wire (or socket, however you're connecting to the db) just to end up throwing away most (a lot?) of it each time. Even if the DB's native search capabilities ('like' matches, regexp, full-text, etc...) aren't up to the task, culling unwanted rows BEFORE they get sent to the client (your code) will still be useful.

You must optimize your tables in DB. Use a md5 hash. New column with md5, will use index and faster found text.
But it don't help if you use LIKE '%text%'.
You can use Sphinx or Lucene.

How to increase performance for MySQL database

How to increase the performance for mysql database because I have my website hosted in shared server and they have suspended my account because of "too many queries"
the stuff asked "index" or "cache" or trim my database
I don't know what does "index" and cache mean and how to do it on php
thanks

What an index is:
Think of a database table as a library - you have a big collection of books (records), each with associated data (author name, publisher, publication date, ISBN, content). Also assume that this is a very naive library, where all the books are shelved in order by ISBN (primary key). Just as the books can only have one physical ordering, a database table can only have one primary key index.
Now imagine someone comes to the librarian (database program) and says, "I would like to know how many Nora Roberts books are in the library". To answer this question, the librarian has to walk the aisles and look at every book in the library, which is very slow. If the librarian gets many requests like this, it is worth his time to set up a card catalog by author name (index on name) - then he can answer such questions much more quickly by referring to the catalog instead of walking the shelves. Essentially, the index sets up an 'alternative ordering' of the books - it treats them as if they were sorted alphabetically by author.
Notice that 1) it takes time to set up the catalog, 2) the catalog takes up extra space in the library, and 3) it complicates the process of adding a book to the library - instead of just sticking a book on the shelf in order, the librarian also has to fill out an index card and add it to the catalog. In just the same way, adding an index on a database field can speed up your queries, but the index itself takes storage space and slows down inserts. For this reason, you should only create indexes in response to need - there is no point in indexing a field you rarely search on.
What caching is:
If the librarian has many people coming in and asking the same questions over and over, it may be worth his time to write the answer down at the front desk. Instead of checking the stacks or the catalog, he can simply say, "here is the answer I gave to the last person who asked that question".
In your script, this may apply in different ways. You can store the results of a database query or a calculation or part of a rendered web page; you can store it to a secondary database table or a file or a session variable or to a memory service like memcached. You can store a pre-parsed database query, ready to run. Some libraries like Smarty will automatically store part or all of a page for you. By storing the result and reusing it you can avoid doing the same work many times.
In every case, you have to worry about how long the answer will remain valid. What if the library got a new book in? Is it OK to use an answer that may be five minutes out of date? What about a day out of date?
Caching is very application-specific; you will have to think about what your data means, how often it changes, how expensive the calculation is, how often the result is needed. If the data changes slowly, it may be best to recalculate and store the result every time a change is made; if it changes often but is not crucial, it may be sufficient to update only if the cached value is more than a certain age.

Setup a copy of your application locally, enable the mysql query log, and setup xdebug or some other profiler. The start collecting data, and testing your application. There are lots of guides, and books available about how to optimize things. It is important that you spend time testing, and collecting data first so you optimize the right things.
Using the data you have collected try and reduce the number of queries per page-view, Ideally, you should be able to get everything you need in less 5-10 queries.
Look at the logs and see if you are asking for the same thing twice. It is a bad idea to request a record in one portion of your code, and then request it again from the database a few lines later unless you are sure the value is likely to have changed.
Look for queries embedded in loop, and try to refactor them so you make a single query and simply loop on the results.
The select * you mention using is an indication you may be doing something wrong. You probably should be listing fields you explicitly need. Check this site or google for lots of good arguments about why select * is evil.
Start looking at your queries and then using explain on them. For queries that are frequently used make sure they are using a good index and not doing a full table scan. Tweak indexes on your development database and test.

There are a couple things you can look into:
Query Design - look into more advanced and faster solutions
Hardware - throw better and faster hardware at the problem
Database Design - use indexes and practice good database design
All of these are easier said than done, but it is a start.

Firstly, sack your host, get off shared hosting into an environment you have full control over and stand a chance of being able to tune decently.
Replicate that environment in your lab, ideally with the same hardware as production; this includes things like RAID controller.
Did I mention that you need a RAID controller. Yes you do. You can't achieve decent write performance without one - which needs a battery backed cache. If you don't have one, each write needs to physically hit the disc which is ruinous for performance.
Anyway, back to read performance, once you've got the machine with the same spec RAID controller (and same discs, obviously) as production in your lab, you can try to tune stuff up.
More RAM is usually the cheapest way of achieving better performance - make sure that you've got MySQL configured to use it - which means tuning storage-engine specific parameters.
I am assuming here that you have at least 100G of data; if not, just buy enough ram that your entire DB fits in ram then read performance is essentially solved.
Software changes that others have mentioned such as optimising queries and adding indexes are helpful too, but only once you've got a development hardware environment that enables you to usefully do performance work - i.e. measure performance of your application meaningfully - which means real hardware (not VMs), which is consistent with the hardware environment used in production.
Oh yes - one more thing - don't even THINK about deploying a database server on a 32-bit OS, it's a ruinous waste of good ram.

Indexing is done on the database tables in order to speed queries. If you don't know what it means you have none. At a minumum you should have indexes on every foriegn key and on most fileds that are used frequently in the where clauses of your queries. Primary keys should have indexes automatically assuming you set them up to begin with which I would find unlikely in someone who doesn't know what an index is. Are your tables normalized?
BTW, since you are doing a division in your math (why I haven't a clue), you should Google integer math. You may neot be getting correct results.

You should not select * ever. Instead, select only the data you need for that particular call. And what is your intention here?
order by votes*1000+((1440 - ($server_date - date))/60)2+visites600 desc

You may have poorly-written queries, and/or poorly written pages that run too many queries. Could you give us specific examples of queries you're using that are ran on a regular basis?

sure
this query to fetch the last 3 posts
select * from posts where visible = 1 and date > ($server_date - 86400) and dont_show_in_frontpage = 0 order by votes*1000+((1440 - ($server_date - date))/60)*2+visites*600 desc limit 3
what do you think?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.