I'd like to know if this:
$column_family->get('row_key', $columns=array('name1', 'name2'));
Is faster then the more flexible get i now use:
$column_family->get('row_key');
Method 1 is harder to implement of course but will it give less load/bandwidth/delay?
Cassandra is not mysql so it will come as no surprise that some things are different there. :)
In this case, Cassandra's sparse-row storage model means that for small numbers of columns the full-row version will be faster because Cassandra doesn't need to deserialize and check its row-level column entries.
Of course for larger numbers of columns the extra work of deserializing more than you need will dominate again.
Bottom line: worrying about this is almost certainly premature optimization. When it's not, test.
First one is faster, especially if you work with large tables that contain plenty of columns.
Even you have just two columns called name1 and name2, specifying their names should avoid extracting column names from table structure on MySQL side. So it should be faster than using * selector.
However, test your results using microtime() in PHP against large tables and you'll see what I'm talking about. Of course, if you have 20+ columns in table and you want to extract them all it's easier to put * than listing all those column-names but in terms of speed, listing columns is bit quicker.
The best way to check out this conclusion, is to test it by yourself.
Related
I'm attempting to write a search functionality for a website, and I've decided upon an approach of using MySQL temporary tables to handle the data input, via the query below:
CREATE TEMPORARY TABLE `patternmatch`
(`pattern` VARCHAR(".strlen($queryLengthHere)."))
INSERT INTO `patternmatch` VALUES ".$someValues
Where $someValues is a set of data with the layout ('some', 'search', 'query') - or basically what the user searched. I then search my main table images based on the data within table patternmatch like so:
SELECT images.* FROM images JOIN patternmatch ON (images.name LIKE patternmatch.pattern)
I then apply a heuristic or scoring system based on how well each result matched the input and display the results by that heuristic etc.
What I'm wondering is how much overhead does creating a temporary table require? I understand that they only exist in session, and are dropped as soon as the session is ended, but if I have hundreds of thousands of searches per second, what sort of performance issues might I encounter? Is there any better way of implementing a search functionality?
What you stated is totally correct, the temporary table will only be visible to the current user/connection. Still, there is some overhead and some other problems such as:
For each of the thousands of searches you are going to create and fill that table (and drop it later) - not per user, per search. Because each search most likely will re-execute the script, and "per session" does not mean PHP session - it means database session (open connection).
You will need the CREATE TEMPORARY TABLES privilege, which you might not have.
Still, that table really should have MEMORY type, which steals your RAM more than it looks like. Because even having VARCHAR, MEMORY tables use fixed length row-storage.
If your heuristics later need to refer to that table twice (like SELECT xyz FROM patternmatch AS pm1, patternmatch AS pm2 ...) - this is not possible with MEMORY tables.
Next, it would be easier for you - and also for the database - to add the LIKE '%xyz%' directly to your images tables WHERE clause. It will do the same without the overhead of creating a TEMP TABLE and joining it.
In any case - no matter which way you go - that WHERE will be horribly slow. Even if you add an index on images.name you most likely will need LIKE '%xyz%' instead of LIKE 'xyz%', so that index will not get used.
I'm asking whether a session-specific temporary table to handle the search input by the user (created on a search, dropped on the end of a session) is an appropriate way of handling a search functionality.
No. :)
Alternative options
MySQL has a build-in Fulltext-Search (since 5.6 also for InnoDB) that even can give you that scoring: I highly recommend giving it a read and a try. You can be sure that the database knows better than you how to do that search efficiently.
If you are going to use MyISAM instead of InnoDB, be aware of the often overlooked limitation that FULLTEXT searches only return anything if the number of results is less than 50% of the total table rows.
Other things that you might want to look at, are for example Solr (Nice introduction read to that topic itself would be the beginning of http://en.wikipedia.org/wiki/Apache_Solr ). We are using it in our company and it does a great job, but it requires quite some learning.
Summary
The solution to your current problem itself (the search) is to use the FULLTEXT capabilities.
If I have hundreds of thousands of searches per second, what sort of performance issues might I encounter? Is there any better way of implementing a search functionality?
To give you a number, 10.000 calls per second is not "trivial" already - with hundreds of thousands of searches per second the sort of performance issues you will encounter are everywhere in your set-up. You are going to need a couple of servers, load balancing and tons of other amazing tech crap. And one of this will be for example Solr ;)
Creating temporary tables on disk is relatively expensive. In your scenario it sounds like it'll be slower than it's worth.
It's usually only worthwhile to create temporary tables in memory. But you need to know you have enough memory available at all times. If you plan to support so many searches per second this is not a good solution.
MySQL has full-text searching built-in. It's good for small systems. This would likely perform far better than your temp table and JOIN. But if you want to support thousands of searches per second I would not recommend it. It could consume too much of your overall database performance. Plus you're then forced to use MyISAM for storage which might have its own issues in your scenario.
For so many searches you'll want to offload the work to another system. Plenty of searching systems with scoring already exist. Take a look at ElasticSearch, Solr/Lucene, Redis, etc.
From the code you give, I really don't think tmp tables are needed, nor is FULLTEXT searching. But ... about tmp table performance:
The creation/cleanup of the tmp table is not written to transaction logs, so it will be relatively quick for the OS to do the I/O involved. If the temporary tables will be small and short-lived, and you have lots of buffers available for the OS, the disk realistically wont even be touched. If you think it will be anyways, get an SSD drive, and get more RAM.
But if you are realistic that you are looking at hundreds of thousands of searches per second then you have a big engineering project on hand. Why not just do:
select images.* from images where name in ('some', 'search', 'query')
?
I'm struggling with a philosophical question on database programming in PHP. In particular, I'm trying to decide when it's best to read in an entire table into an object, vs. querying MySQL directly whenever I need data.
Is there ever a situation where you'd want to just read in the entire database into an object? Where do you draw the line?
For example, if I had a table full of names and phone numbers, and I need to get the phone number for one individual, that's a simple one-time mysql query. Reading in an entire table into an associative array just to get one phone number sounds ridiculous... But:
(1) what if I need to get the names and phone numbers of 50 individuals? 100? 1000?
(2) When is it more efficient (if ever) to read in the entire table into an object? Is performing 1000 mysql queries on 1000 names always going to be more efficient than reading in the entire table?
(2a) Obviously it would depend on the total number of records in the table. Would it be better to do 1000 queries for 1000 phone numbers, or read in a table of 2000 total records from a MySQL into an associative array? What if it was 5000 total records, and I needed 1000? What if it was 10k? Etc. etc.
(3) What if I need to do something a little more complex, like return all phone numbers in a certain area code? Obviously in that case I could use a regexp SQL query, but I'm sure I could come up with a more complex case where a simple query doesn't give me exactly what I want.
I guess what I'm getting at is, as a developer, you have several knobs you can turn to optimize your application. Obviously you want to think about the data you're using and optimize the database model to match the types of data requests you'll be doing. But sometimes you get into a mutually exclusive case where you're forced to pick optimizing your data model for one scenario, at the expense of another, competing scenario.
Any thoughts?
Databases are designed to be efficient at locating and returning exactly the data that you need to work with for a particular operation.
Transferring data over a network connection is orders of magnitude slower than processing it on the machine where it resides. Use databases for what they're good at... holding lots of information and allowing application code to query and work with exactly the subset of that data it needs to at a given point in time.
If you find that you need to frequently access the same data over and over, caching it at the application layer or in a dedicated caching solution like memcached does make sense, but I cannot imagine a scenario where it makes sense just to read in a whole table because my application logic needs to process a subset of the rows and/or columns in the table.
(3) but I'm sure I could come up with a more complex case where a simple query doesn't give me exactly what I want.
This is usually an indication that your database hasn't been properly normalized and/or has design flaws.
(2) When is it more efficient (if ever) to read in the entire table into an object? Is performing 1000 mysql queries on 1000 names always
Neither is a good choice. SQL is intended for set-based operations. You really need to use the system correctly for it to work well, but to do this you have to have properly designed your database. The best thing would be to write one query that returns exactly the records you want, no more and no less.
what if I need to get the names and phone numbers of 50 individuals
Maybe use something like select * where ID in (1,2,3,...,50), if you have a larger number of users, maybe create a temporary table with the list of users you want, and join on that. With a properly designed database there is usually a good way to retrieve a set of data with a single query.
I'm building a very large website currently it uses around 13 tables and by the time it's done it should be about 20.
I came up with an idea to change the preferences table to use ID, Key, Value instead of many columns however I have recently thought I could also store other data inside the table.
Would it be efficient / smart to store almost everything in one table?
Edit: Here is some more information. I am building a social network that may end up with thousands of users. MySQL cluster will be used when the site is launched for now I am testing using a development VPS however everything will be moved to a dedicated server before launch. I know barely anything about NDB so this should be fun :)
This model is called EAV (entity-attribute-value)
It is usable for some scenarios, however, it's less efficient due to larger records, larger number or joins and impossibility to create composite indexes on multiple attributes.
Basically, it's used when entities have lots of attributes which are extremely sparse (rarely filled) and/or cannot be predicted at design time, like user tags, custom fields etc.
Granted I don't know too much about large database designs, but from what i've seen, even extremely large applications store their things is a very small amount of tables (20GB per table).
For me, i would rather have more info in 1 table as it means that data is not littered everywhere, and that I don't have to perform operations on multiple tables. Though 1 table also means messy (usually for me, each object would have it's on table, and an object is something you have in your application logic, like a User class, or a BlogPost class)
I guess what i'm trying to say is that do whatever makes sense. Don't put information on the same thing in 2 different table, and don't put information of 2 things in 1 table. Stick with 1 table only describes a certain object (this is very difficult to explain, but if you do object oriented, you should understand.)
nope. preferences should be stored as-they-are (in users table)
for example private messages can't be stored in users table ...
you don't have to think about joining different tables ...
I would first say that 20 tables is not a lot.
In general (it's hard to say from the limited info you give) the key-value model is not as efficient speed wise, though it can be more efficient space wise.
I would definitely not do this. Basically, the reason being if you have a large set of data stored in a single table you will see performance issues pretty fast when constantly querying the same table. Then think about the joins and complexity of queries you're going to need (depending on your site)... not a task I would personally like to undertake.
With using multiple tables it splits the data into smaller sets and the resources required for the query are lower and as an extra bonus it's easier to program!
There are some applications for doing this but they are rare, more or less if you have a large table with a ton of columns and most aren't going to have a value.
I hope this helps :-)
I think 20 tables in a project is not a lot. I do see your point and interest in using EAV but I don't think it's necessary. I would stick to tables in 3NF with proper FK relationships etc and you should be OK :)
the simple answer is that 20 tables won't make it a big DB and MySQL won't need any optimization for that. So focus on clean DB structures and normalization instead.
I recently started working for a fairly small business, which runs a small website. I over heard a co worker mention that either our site or MySQL databases get hit ~87 times a second.
I was also tasked today, with reorganizing some tables in these databases. I have been taught in school that good database design dictates that to represent a many-to-many relationship between two tables I should use a third table as a middle man of sorts. (This third table would contain the id of the two related rows in the two tables.)
Currently we use two separate databases, totalling to a little less than 40 tables, with no table having more than 1k rows. Right now, some PHP scripts use a third table to relate certain rows that has a third column that is used to store a string of comma separated ids if a row in one table relates to more than one row in some other table(s). So if they want to use an id from the third column they would have to get the string and separate it and get the proper id.
When I mentioned that we should switch to using the third table properly like good design dictates they said that it would cause too much overhead for such small tables, because they would have to use several join statements to get the data they wanted.
Finally, my question is would creating stored procedures for these joins mitigate the impact these joins would have on the system?
Thanks a bunch, sorry for the lengthy explanation!
By the sound of things you should really try to redesign your database schema.
two separate databases, totalling to a little less than 40 tables, with no table having more than 1k rows
Sounds like it's not properly normalized - or it has been far to aggressively normalized and would benefit from some polymorphism.
comma separated ids
Oh dear - surrogate keys - not intrinsically bad but often a sign of bad design.
a third table to relate certain rows that has a third column that is used to store a string of comma separated ids
So it's a very long way from normalised - this is really bad.
they said that it would cause too much overhead for such small tables
Time to start polishing up your resume I think. Sounds like 'they' know very little about DBMS systems.
But if you must persevere with this - its a lot easier to emulate a badly designed database from a well designed one (hint - use views) than vice versa. Redesign the database offline and compare the performance of tuned queries - it will run at least as fast. Add views to allow the old code to run unmodified and compare the amount of code you need to performa key operations.
I don't understand how storing a comma separated list of id's in a single column, and having to parse the list of ids in order to get all associated rows, is less complex than a simple table join.
Moving your queries into a stored procedure normally won't provide any sort of benefit. But if you absolutely have to use the comma separated list of values that represent foreign key associations, then a stored procedure may improve performance. Perhaps in your stored procedure you could declare a temporary table (see Create table variable in MySQL for example), and then populate the temporary table, 1 row for every value contained in your comma separated string.
I'm not sure what type of performance gain you would get by doing that though, considering like you mentioned there's not a lot of rows in any of the tables. The whole exercise seems a bit silly to do. Ditching the comma separated list of id's would be the best way to go.
It will be both quicker and more simple to do it in the database than in PHP; that's what database engines are good at. Make indexes on the keys (InnoDB will do this by default) and the joins will be fast; to be honest, with tables that tiny, the joins will almost always be fast.
Stored procedures don't really come into the picture, for two reasons; mainly, they're not going to make any difference to the impact (not that there's any impact anyway - you will be improving app performance by doing this in the DB rather than at the PHP level).
Secondly, avoid MySQL stored procedures like the plague, they're goddamn awful to write and debug if you've ever worked in the stored procedure languages for any other DB at all.
I have a classifieds website, and I am thinking about redesigning the database a bit.
Currently I have 7 tables in the db. One table for each "MAIN CATEGORY".
For example, I have a "VEHICLES" table which holds all information about the following categories of classifieds:
cars
mc
mopeds/scooters
trucks
boats
etc etc
However, users on the website usually search in specific categories. For example, the user chooses the "cars" category to search in, and enters a keyword.
My code today, will search the entire VEHICLES table for all records with the field "category" equal to "cars", and then get their details:
"SELECT * IN vehicles WHERE category='cars' AND alot of other conditions" // just for example, not tested
I am thinking about making a table now, for each of these "sub-categories".
Ie, one for cars, one for mc, one for trucks etc, so that search isn't done through information which isn't needed.
Will this increase search speed? Because I have calculated that I will need atleast 30 or so tables for this.
Thanks
With a properly indexed table and a "reasonable" number of rows, you will not gain much speed from this approach. Anything you gain in speed of execution you will lose in time-to-market because your programming will become more complicated.
Do not perform this optimization unless and until you encounter a performance problem in testing with a representative set of data.
It will increase the speed of a search within the same category. It will potentially slow down queries where you need aggregate information from the different categories. You need to decide which is the best option for your site.
How many records do you have in total in the vehicles table. Its quite likely that adding proper indexes will greatly increase the speed of your searches.
Check out the 'EXPLAIN' query option in MySQL. Understanding this will help you optimize your database a lot with indices.
Performance optimization is as much art as science, and to really understand what's the best option requires that you do some benchmarking; anyone offering a definitive answer given the available information is just wrong. That said, a few thoughts on your situation:
You don't say what type your category column is now, but if it's a string type, it's probably using more space than other options, thus making the table larger. Proper indexing can help tremendously with speed, but a larger table with larger indexes will always work to do just the opposite.
As already mentioned by someone else, your queries within a category will be faster in the simple case of a category search. How much faster depends on how much data you have in your current table, and the increases may be negated if you have to join in other tables to satisfy the need for all the other conditions to which you alluded. OTOH, it may actually speed things up in certain join cases (e.g., if you were doing self-joins with your all-encompassing table).
If you're working with a lot of data, splitting into multiple tables can greatly ease backups.
Splitting into multiple tables may also make it easier to shard your data across multiple servers for performance reasons. Similarly, it may make replication setups easier to keep running.
If you're tracking data that's category-specific, separate tables enables you to better normalize your database and likely reap some nice performance as a result of using much smaller tables.
Splitting obviously means modifying your code. If your code is of the old, creaky type, you may very well achieve a performance gain from the clean-up. Of course, there's also the risk that you'll break something....
Check your indexes. Bad indexes are a very common cause of poor performance but are relatively easy to fix with a bit of quality time spent on self-education. MySQL's EXPLAIN can tell you whether your queries are using the indexes, and the index stats (look in the docs) can tell you how efficiently your indexes are working.
Finally, speaking of code, check yours. Try experimenting with a few approaches, regardless of how the database is set up. For example, it may be quicker to do a couple of separate queries and join the results in code than to do the join in the database. Likewise, it's often quicker to do things like sorts in code, particularly in cases where a join or something means the database would have to create a temporary file/table. Again, check the EXPLAIN output, and if you can't eliminate a problem area in your queries, see if it helps to simplify the queries and do more work in the code. This can be particularly beneficial in the common case where the web server has more resources to spare than the database server.
There are many more factors to consider. Ultimately, though, the best way to make these decisions is not to spend time pondering theories but to put both methods to the test. Create some test databases and benchmark the sort of queries you'd run most often, with and without simulated load. You'll get your answer.
if you are using php try something like
$query = mysql_query($sql);
while($row = mysql_fetch_assoc($query)){
$tempvalue[]=$row;
}
and then to loop the info use for like sentence
foreach($tempvalue as $key => $value){
write the table .....
}
maybe mysql isnt slow and the problem is in the code
test dont kill anyone =)