i've recently started learning Redis and am currently building an app using it as sole datastore and I'd like to check with other Redis users if some of my conclusions are correct as well as ask a few questions. I'm using phpredis if that's relevant but I guess the questions should apply to any language as it's more of a pattern thing.
As an example, consider a CRUD interface to save websites (name and domain) with the following requirements:
Check for existing names/domains when saving/validating a new site (duplicate check)
Listing all websites with sorting and pagination
I have initially chosen the following "schema" to save this information:
A key "prefix:website_ids" in which I use INCR to generate new website id's
A set "prefix:wslist" in which I add the website id generated above
A hash for each website "prefix:ws:ID" with the fields name and website
The saving/validation issue
With the above information alone I was unable (as far as I know) to check for duplicate names or domains when adding a new website. To solve this issue I've done the following:
Two sets with keys "prefix:wsnames" and "prefix:wsdomains" where I also SADD the website name and domains.
This way, when adding a new website I can check if the submitted name or domain already exist in either of these sets with SISMEMBER and fail the validation if needed.
Now if i'm saving data with 50 fields instead of just 2 and wanted to prevent duplicates I'd have to create a similar set for each of the fields I wanted to validate.
QUESTION 1: Is the above a common pattern to solve this problem or is there any other/better way people use to solve this type of issue?
The listing/sorting issue
To list websites and sort by name or domain (ascending or descending) as well as limiting results for pagination I use something like:
SORT prefix:wslist BY prefix:ws:*->name ALPHA ASC LIMIT 0 10
This gives me 10 website ids ordered by name. Now to get these results I came to the following options (examples in php):
Option 1:
$wslist = the sort command here;
$websites = array();
foreach($wslist as $ws) {
$websites[$ws] = $redis->hGetAll('prefix:ws:'.$ws);
}
The above gives me a usable array with website id's as key and an array of fields. Unfortunately this has the problem that I'm doing multiple requests to redis inside a loop and common sense (at least coming from RDBMs) tells me that's not optimal.
The better way it would seem to be to use redis pipelining/multi and send all request in a single go:
Option 2:
$wslist = the sort command here;
$redis->multi();
foreach($wslist as $ws) {
$redis->hGetAll('prefix:ws:'.$ws);
}
$websites = $redis->exec();
The problem with this approach is that now I don't get each website's respective ID unless I then loop the $websites array again to associate each one. Another option is to maybe also save a field "id" with the respective website id inside the hash itself along with name and domain.
QUESTIONS 2/3: What's the best way to get these results in a usable array without having to loop multiple times? Is it correct or good practice to also save the id number as a field inside the hash just so I can also get it with the results?
Disclaimer: I understand that the coding and schema building paradigms when using a key->value datastores like Redis are different from RDBMs and document stores and so notions of "best way to do X" are likely to be different depending on the data and application at hand.
I also understand that Redis might not even be the most suitable datastore to use in mostly CRUD type apps but I'd still like to get any insights from more experienced developers since CRUD interfaces are very common on most apps.
Answer 1
Your proposal looks pretty common. I'm not sure why you need an auto-incrementing ID though. I imagine the domain name has to be unique, or the website name has to be unique, or at the very least the combination of the two has to be unique. If this is the case it sounds like you already have a perfectly good key, so why invent an integer key when you don't need it?
Having a SET for domains and a SET for website names is a perfect solution for quickly checking to see if a specific domain or website name already exists. Though, if one of those (domain or website name) is your key you might not even need these SETs since you could just look if the key prefix:ws:domain-or-ws-name-here exists.
Also, using a HASH for each website so you can store your 50 fields of details for the website inside is perfect. That is what hashes are for.
Answer 2
First, let me point out that if your websites and domain names are stored in SORTED SETs instead of SETs, they will already be alphabetized (assuming they are given the same score). If you are trying to support other sort options this might not help much, but wanted to point it out.
Your Option 1 and Option 2 are actually both relatively reasonable. Redis is lightning fast, so Option 1 isn't as unreasonable as it seems at first. Option 2 is clearly even more optimal from the perspective of redis since all the commands will be bufferred and executed all at once. Though, it will require additional processing in PHP afterwards as you noted if you want the array to be indexed by the id.
There is a 3rd option: lua scripting. You can have redis execute a Lua script that returns both the ids and hash values all in one shot. But, not being super familiar with PHP anymore and how redis's multibyte replies map to PHPs arrays I'm not 100% sure what the lua script would look like. You'll need to look for examples or do some trial and error. It should be a pretty simple script, though.
Conclusion
I think redis sounds like a decent solution for your problem. Just keep in mind the dataset needs to always be small enough to keep in memory. If that's not really a concern (unless your fields are huge, you should be able to fit thousands of websites into only a few MB) or if you don't mind having to upgrade your RAM to grow your DB, then Redis is perfectly suitable.
Be familiar with the various persistence options and configurations for redis and what they mean for availability and reliability. Also, make sure you have a backup solution in place. I would recommend having both a secondary redis instance that slaves off of your main instance, and a recurring process that backs up your redis database file at least daily.
Related
We've been developing for Wordpress for several years and whilst our workflow has been upgraded at several points there's one thing that we've never solved... merging a local Wordpress database with a live database.
So I'm talking about having a local version of the site where files and data are changed, whilst the data on the live site is also changing at the same time.
All I can find is the perfect world scenario of pulling the site down, nobody (even customers) touching the live site, then pushing the local site back up. I.e copying one thing over the other.
How can this be done without running a tonne of mysql commands? (it feels like they could fall over if they're not properly checked!) Can this be done via Gulp's (I've seen it mentioned) or a plugin?
Just to be clear, I'm not talking about pushing/pulling data back and forth via something like WP Migrate DB Pro, BackupBuddy or anything similar - this is a merge, not replacing one database with another.
I would love to know how other developers get around this!
File changes are fairly simple to get around, it's when there's data changes that it causes the nightmare.
WP Stagecoach does do a merge but you can't work locally, it creates a staging site from the live site that you're supposed to work on. The merge works great but it's a killer blow not to be able to work locally.
I've also been told by the developers that datahawk.io will do what I want but there's no release date on that.
It sounds like VersionPress might do what you need:
VersionPress staging
A couple of caveats: I haven't used it, so can't vouch for its effectiveness; and it's currently in early access.
Important : Take a backup of Live database before merging Local data to it.
Follow these steps might help in migrating the large percentage of data and merging it to live
Go to wp back-end of Local site Tools->Export.
Select All content radio button (if not selected by default).
This will bring an Xml file containing all the local data comprised of all default post types and custom post types.
Open this XML file in notepad++ or any editor and find and replace the Local URL with the Live URL.
Now visit the Live site and Import the XML under Tools->Import.
Upload the files (images) manually.
This will bring a large percentage of data from Local to Live .
Rest of the data you will have to write custom scripts.
Risk factors are :
When uploading the images from Local to Live , images of same name
will be overriden.
Wordpress saves the images in post_meta generating a serialized data for the images , than should be taken care of when uploading the database.
Serialized data in post_meta for post_type="attachment" saves serialized data for 3 or 4 dimensions of the images.
Usernames or email ids of users when importing the data , can be same (Or wp performs the function of checking unique usernames and emails) then those users will not be imported (might be possible).
If I were you I'd do the following (slow but affords you the greatest chance of success)
First off, set up a third database somewhere. Cloud services would probably be ideal, since you could get a powerful server with an SSD for a couple of hours. You'll need that horsepower.
Second, we're going to mysqldump the first DB and pipe the output into our cloud DB.
mysqldump -u user -ppassword dbname | mysql -u root -ppass -h somecloud.db.internet
Now we have a full copy of DB #1. If your cloud supports snapshotting data, be sure to take one now.
The last step is to write a PHP script that, slowly but surely, selects the data from the second DB and writes it to the third. We want to do this one record at a time. Why? Well, we need to maintain the relationships between records. So let's take comments and posts. When we pull post #1 from DB #2 it won't be able to keep record #1 because DB #1 already had one. So now post #1 becomes post #132. That means that all the comments for post #1 now need to be written as belonging to post #132. You'll also have to pull the records for the users who made those posts, because their user IDs will also change.
There's no easy fix for this but the WP structure isn't terribly complex. Building a simple loop to pull the data and translate it shouldn't be more then a couple of hours of work.
If I understand you, to merge local and live database, until now I'm using other software such as NavicatPremium, it has Data Sycn feature.
This can be achieved live using spring-xd, create a JDBC Stream to pull data from one db and insert into the other. (This acts as streaming so you don't have to disturb any environment)
The first thing you need to do is asses if it would be easier to do some copy-paste data entry instead of a migration script. Sometimes the best answer is to suck it up and do it manually using the CMS interface. This avoids any potential conflicts with merging primary keys, but you may need to watch for references like the creator of a post or similar data.
If it's just outright too much to manually migrate, you're stuck with writing a script or finding one that is already written for you. Assuming there's nothing out there, here's what you do...
ALWAYS MAKE A BACKUP BEFORE RUNNING MIGRATIONS!
1) Make a list of what you need to transfer. Do you need users, posts, etc.? Find the database tables and add them to the list.
2) Make a note all possible foreign keys in the database tables being merged into the new database. For example, wp_posts has post_author referencing wp_users. These will need specific attention during the migration. Use this documentation to help find them.
3) Once you know what tables you need and what they reference, you need to write the script. Start by figuring out what content is new for the other database. The safest way is to do this manually with some kind of side-by-side list. However, you can come up with your own rules on how to automatically match table rows. Maybe to check for $post1->post_content === $post2->post_content in cases the text needs to be the same. The only catch here is the primary/foreign keys are off limits for these rules.
4) How do you merge new content? The general idea is that all primary keys will need to be changed for any new content. You want to use everything except for the id of post and insert that into the new database. There will be an auto-increment to create the new id, so you wont need the previous id (unless you want it for script output/debug).
5) The tricky part is handling the foreign keys. This process is going to vary wildly depending on what you plan on migrating. What you need to know is which foreign key goes to which (possibly new) primary key. If you're only migrating posts, you may need to hard-code a user id to user id mapping for the post_author column, then use this to replace the values.
But what if I don't know the user ids for the mapping because some users also need to be migrated?
This is where is gets tricky. You will need to first define the merge rules to see if a user already exists. For new users, you need record the id of the newly inserted users. Then after all users are migrated, the post_author value will need to be replaced when it references a newly merged user.
6) Write and test the script! Test it on dummy databases first. And again, make backups before using it on your databases!
I've done something simillar with ETL (Extract, Transform, Load) process when I was moving data from one CMS to another.
Rather than writing a script I used a Pentaho Data Integration (Kettle) tool.
The Idea of ETL is pretty much straight forward:
Extract the data (for instance from one database)
Transform it to suit your needs
Load it to the final destination (your second database).
The tool is easy to use and it allows you to experiment with various steps and outputs to investigate the data. When you design a right ETL proces, you are ready to merge those databases of yours.
How can this be done without running a tonne of mysql commands?
No way. If both local and web sites are running at the same time how can you prevent not having the same ids' with different content?
so if you want to do this you can use mysql repication.i think it will help you to merge with different database mysql.
Please help me argue my point.
I am working on a website project with a team of developers, we are developing the system in 3 parts. The one part is the API, 2 back-end and front-end. Both the front end and back-end gets and stores data by sending it to the API.
I am specifically responsible for the front end. I am using Codeigniter as my framework.
A little background: The app is a sports betting site.
This is the problem: The developers of the API use the name of for example a tournament or fixture or sport to do the lookup, I pass the name of a tournament for example:
www.example.com/sport/add_bet/{tournament_name}
The problem I have with this is that the tournament name as entered into the system by humans might have characters such as spaces, forward slashes, etc in the name.
As you can imagine using a forward slash in the url will completely break the system, since we use them to call different controllers, actions and to pass variables.
I am trying to get them to change to using a simple primary key id field, to perform the lookup of the data. For some reason these developers don't want to do this.
The project manager that manages this project (not a programmer and no experience of programming) had a chat to them about this issue, but still they don't want to change, and they told her that it is a matter of personal preference on which way to go.
As far as I know ID's have always been the way to do it.
Could you guys/girls please help me argue my point by giving some reasons as to why I am correct or incorrect in your view. I would like to provide your answers as motivation to get them to change over to doing it the right way.
Your help/answers/suggestions would be much appreciated.
The most important thing is the id will be unique as it is should be the primary key. so searching by ids will return unique results.
But the multiple record may have save title if you didn't validate them at the time of saving.
And also if you want some joins or something like that the id would help it.
And the should never trust the user and expect them to work as you wanted.
There is two sides:
1) You allow select single Title from dropdown and send to server only ID. Look-up by ID is way faster (assuming you are using ID as primary key). But if you have lots of Titles than you have to list all of them and user will be forced to scroll till find that Title.
2) You have simple input field to allow search only by part of Title. That way you don't have to list all Titles. As programmer, you have to escape all user input, that goes to server (via GET or POST), so that user can input even DELETE FROM user WHERE 1 to your input field and your system will sill works fine. Also, by inputting only part of Title allow to show multiple results, while using IDs is impossible.
I prefer second approach.
To make the look up fast, you need to place an index on the column by which you are looking up records. Primary key column always has an index. In order to use some other column you need to add an unique index, to avoid duplicates and make the search faster, which in turn makes the table larger. If you expect the table to grow (which is not too unlikely if you follow many sports and many leagues/tournaments over a number of years), it might become a problem at some point, depending on the resources in your production environment. It's not the strongest argument you can present, but it is not a bad argument either
I've been browsing the net trying to find a solution that will allow us to generate unique IDs in a regionally distributed environment.
I looked at the following options (among others):
SNOWFLAKE (by Twitter)
It seems like a great solutions, but I just don't like the added complexity of having to manage another software just to create IDs;
It lacks documentation at this stage, so I don't think it will be a good investment;
The nodes need to be able to communicate to one another using Zookeeper (what about latency / communication failure?)
UUID
Just look at it: 550e8400-e29b-41d4-a716-446655440000;
Its a 128 bit ID;
There has been some known collisions (depending on the version I guess) see this post.
AUTOINCREMENT IN RELATIONAL DATABASE LIKE MYSQL
This seems safe, but unfortunately, we are not using relational databases (scalability preferences);
We could deploy a MySQL server for this like what Flickr does, but again, this introduces another point of failure / bottleneck. Also added complexity.
AUTOINCREMENT IN A NON-RELATIONAL DATABASE LIKE COUCHBASE
This could work since we are using Couchbase as our database server, but;
This will not work when we have more than one clusters in different regions, latency issues, network failures: At some point, IDs will collide depending on the amount of traffic;
MY PROPOSED SOLUTION (this is what I need help with)
Lets say that we have clusters consisting of 10 Couchbase Nodes and 10 Application nodes in 5 different regions (Africa, Europe, Asia, America and Oceania). This is to ensure that content is served from a location closest to the user (to boost speed) and to ensure redundancy in case of disasters etc.
Now, the task is to generate IDs that wont collide when the replication (and balancing) occurs and I think this can be achieved in 3 steps:
Step 1
All regions will be assigned integer IDs (unique identifiers):
1 - Africa;
2 - America;
3 - Asia;
4 - Europe;
5 - Ociania.
Step 2
Assign an ID to every Application node that is added to the cluster keeping in mind that there may be up to 99 999 servers in one cluster (even though I doubt: just as a safely precaution). This will look something like this (fake IPs):
00001 - 192.187.22.14
00002 - 164.254.58.22
00003 - 142.77.22.45
and so forth.
Please note that all of these are in the same cluster, so that means you can have node 00001 per region.
Step 3
For every record inserted into the database, an incremented ID will be used to identify it, and this is how it will work:
Couchbase offers an increment feature that we can use to create IDs internally within the cluster. To ensure redundancy, 3 replicas will be created within the cluster. Since these are in the same place, I think it should be safe to assume that unless the whole cluster is down, one of the nodes responsible for this will be available, otherwise a number of replicas can be increased.
Bringing it all together
Say a user is signing up from Europe:
The application node serving the request will grab the region code (4 in this case), get its own ID (say 00005) and then get an incremented ID (1) from Couchbase (from the same cluster).
We end up with 3 components: 4, 00005,1. Now, to create an ID from this, we can just join these components into 4.00005.1. To make it even better (I'm not too sure about this), we can concatenate (not add them up) the components to end up with: 4000051.
In code, this will look something like this:
$id = '4'.'00005'.'1';
NB: Not $id = 4+00005+1;.
Pros
IDs look better than UUIDs;
They seem unique enough. Even if a node in another region generated the same incremented ID and has the same node ID as the one above, we always have the region code to set them apart;
They can still be stored as integers (probably Big Unsigned integers);
It's all part of the architecture, no added complexities.
Cons
No sorting (or is there)?
This is where I need your input (most)
I know that every solution has flaws, and possibly more that what we see on the surface. Can you spot any issues with this whole approach?
Thank you in advance for your help :-)
EDIT
As #DaveRandom suggested, we can add the 4th step:
Step 4
We can just generate a random number and append it to the ID to prevent predictability. Effectively, you end up with something like this:
4000051357 instead of just 4000051.
I think this looks pretty solid. Each region maintains consistency, and if you use XDCR there are no collisions. INCR is atomic within a cluster, so you will have no issues there. You don't actually need to have the Machine code part of it. If all the app servers within a region are connected to the same cluster, it's irrelevant to infix the 00001 part of it. If that is useful for you for other reasons (some sort of analytics) then by all means, but it isn't necessary.
So it can simply be '4' . 1' (using your example)
Can you give me an example of what kind of "sorting" you need?
First: One downside of adding entropy (and I am not sure why you would need it), is you cannot iterate over the ID collection as easily.
For Example: If you ID's from 1-100, which you will know from a simple GET query on the Counter key, you could assign tasks by group, this task takes 1-10, the next 11-20 and so on, and workers can execute in parallel. If you add entropy, you will need to use a Map/Reduce View to pull the collections down, so you are losing the benefit of a key-value pattern.
Second: Since you are concerned with readability, it can be valuable to add a document/object type identifier as well, and this can be used in Map/Reduce Views (or you can use a json key to identify that).
Ex: 'u:' . '4' . '1'
If you are referring to ID's externally, you might want to obscure in other ways. If you need an example, let me know and I can append my answer with something you could do.
#scalabl3
You are concerned about IDs for two reasons:
Potential for collisions in a complex network infrastructure
Appearance
Starting with the second issue, Appearance. While a UUID certainly isn't a great beauty when it comes to an identifier, there are diminishing returns as you introduce a truly unique number across a complex data center (or data centers) as you mention. I'm not convinced that there is a dramatic change in perception of an application when a long number versus a UUID is used for example in a URL to a web application. Ideally, neither would be shown, and the ID would only ever be sent via Ajax requests, etc. While a nice clean memorable URL is preferable, it's never stopped me from shopping at Amazon (where they have absolutely hideous URLs). :)
Even with your proposal, the identifiers, while they would be shorter in the number of characters than a UUID, they are no more memorable than a UUID. So, the appearance likely would remain debatable.
Talking about the first point..., yes, there are a few cases where UUIDs have been known to generate conflicts. While that shouldn't happen in a properly configured and consistently obtained architecture, I can see how it might happen (but I'm personally a lot less concerned about it).
So, if you're talking about alternatives, I've become a fan of the simplicity of the MongoDB ObjectId and its techniques for avoiding duplication when generating an ID. The full documentation is here. The quick relevant pieces are similar to your potential design in several ways:
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
The timestamp can often be useful for sorting. The machine identifier is similar to your application server having a unique ID. The process id is just additional entropy, and finally to prevent conflicts, there is a counter that is auto incremented whenever the timestamp is the same as the last time an ObjectId is generated (so that ObjectIds can be created rapidly). ObjectIds can be generated on the client or on the database. Further, ObjectIds do take up fewer bytes than a UUID (but only 4). Of course, you could not use the timestamp and drop 4 bytes.
For clarification, I'm not suggesting you use MongoDB, but be inspired by the technique they use for ID generation.
So, I think your solution is decent (and maybe you want to be inspired by MongoDB's implementation of a unique ID) and doable. As to whether you need to do it, I think that's a question only you can answer.
I wonder why many web sites choose to use random id:s instead of incrementing from 1 on their database tables. I´ve searched without finding any good reasons, are there any?
Also, which is the best method to use? It seems quite inefficient to check if an id already exists before inserting the data, (takes a second query).
Thanks for your help!
Under the hood, it is likely that they are using incremental ids in the database to identify rows, but the value that gets exposed to end users via the URL parameters is often made into a random string to make the sequence of available objects harder to guess.
It is really a matter of security through obscurity. It hinders automated scripts from proceeding through incremental values and attempting attacks via the URL, and it hinders automated scraping of site content.
If youtube, for example, used incremental ids instead of values like v=HSsdaX4s, you could download every by simply starting at v=1 and incrementing that value millions of times.
Sequential ids do not scale well (they become a synchronization bottle-neck in distributed systems).
Also, you don't need to check if a newly generated random id already exists, you can just assume that it does not (because there are so many of them).
Are you sure that the id's are random? or are they encoded? Either way it is for security.
How to increase the performance for mysql database because I have my website hosted in shared server and they have suspended my account because of "too many queries"
the stuff asked "index" or "cache" or trim my database
I don't know what does "index" and cache mean and how to do it on php
thanks
What an index is:
Think of a database table as a library - you have a big collection of books (records), each with associated data (author name, publisher, publication date, ISBN, content). Also assume that this is a very naive library, where all the books are shelved in order by ISBN (primary key). Just as the books can only have one physical ordering, a database table can only have one primary key index.
Now imagine someone comes to the librarian (database program) and says, "I would like to know how many Nora Roberts books are in the library". To answer this question, the librarian has to walk the aisles and look at every book in the library, which is very slow. If the librarian gets many requests like this, it is worth his time to set up a card catalog by author name (index on name) - then he can answer such questions much more quickly by referring to the catalog instead of walking the shelves. Essentially, the index sets up an 'alternative ordering' of the books - it treats them as if they were sorted alphabetically by author.
Notice that 1) it takes time to set up the catalog, 2) the catalog takes up extra space in the library, and 3) it complicates the process of adding a book to the library - instead of just sticking a book on the shelf in order, the librarian also has to fill out an index card and add it to the catalog. In just the same way, adding an index on a database field can speed up your queries, but the index itself takes storage space and slows down inserts. For this reason, you should only create indexes in response to need - there is no point in indexing a field you rarely search on.
What caching is:
If the librarian has many people coming in and asking the same questions over and over, it may be worth his time to write the answer down at the front desk. Instead of checking the stacks or the catalog, he can simply say, "here is the answer I gave to the last person who asked that question".
In your script, this may apply in different ways. You can store the results of a database query or a calculation or part of a rendered web page; you can store it to a secondary database table or a file or a session variable or to a memory service like memcached. You can store a pre-parsed database query, ready to run. Some libraries like Smarty will automatically store part or all of a page for you. By storing the result and reusing it you can avoid doing the same work many times.
In every case, you have to worry about how long the answer will remain valid. What if the library got a new book in? Is it OK to use an answer that may be five minutes out of date? What about a day out of date?
Caching is very application-specific; you will have to think about what your data means, how often it changes, how expensive the calculation is, how often the result is needed. If the data changes slowly, it may be best to recalculate and store the result every time a change is made; if it changes often but is not crucial, it may be sufficient to update only if the cached value is more than a certain age.
Setup a copy of your application locally, enable the mysql query log, and setup xdebug or some other profiler. The start collecting data, and testing your application. There are lots of guides, and books available about how to optimize things. It is important that you spend time testing, and collecting data first so you optimize the right things.
Using the data you have collected try and reduce the number of queries per page-view, Ideally, you should be able to get everything you need in less 5-10 queries.
Look at the logs and see if you are asking for the same thing twice. It is a bad idea to request a record in one portion of your code, and then request it again from the database a few lines later unless you are sure the value is likely to have changed.
Look for queries embedded in loop, and try to refactor them so you make a single query and simply loop on the results.
The select * you mention using is an indication you may be doing something wrong. You probably should be listing fields you explicitly need. Check this site or google for lots of good arguments about why select * is evil.
Start looking at your queries and then using explain on them. For queries that are frequently used make sure they are using a good index and not doing a full table scan. Tweak indexes on your development database and test.
There are a couple things you can look into:
Query Design - look into more advanced and faster solutions
Hardware - throw better and faster hardware at the problem
Database Design - use indexes and practice good database design
All of these are easier said than done, but it is a start.
Firstly, sack your host, get off shared hosting into an environment you have full control over and stand a chance of being able to tune decently.
Replicate that environment in your lab, ideally with the same hardware as production; this includes things like RAID controller.
Did I mention that you need a RAID controller. Yes you do. You can't achieve decent write performance without one - which needs a battery backed cache. If you don't have one, each write needs to physically hit the disc which is ruinous for performance.
Anyway, back to read performance, once you've got the machine with the same spec RAID controller (and same discs, obviously) as production in your lab, you can try to tune stuff up.
More RAM is usually the cheapest way of achieving better performance - make sure that you've got MySQL configured to use it - which means tuning storage-engine specific parameters.
I am assuming here that you have at least 100G of data; if not, just buy enough ram that your entire DB fits in ram then read performance is essentially solved.
Software changes that others have mentioned such as optimising queries and adding indexes are helpful too, but only once you've got a development hardware environment that enables you to usefully do performance work - i.e. measure performance of your application meaningfully - which means real hardware (not VMs), which is consistent with the hardware environment used in production.
Oh yes - one more thing - don't even THINK about deploying a database server on a 32-bit OS, it's a ruinous waste of good ram.
Indexing is done on the database tables in order to speed queries. If you don't know what it means you have none. At a minumum you should have indexes on every foriegn key and on most fileds that are used frequently in the where clauses of your queries. Primary keys should have indexes automatically assuming you set them up to begin with which I would find unlikely in someone who doesn't know what an index is. Are your tables normalized?
BTW, since you are doing a division in your math (why I haven't a clue), you should Google integer math. You may neot be getting correct results.
You should not select * ever. Instead, select only the data you need for that particular call. And what is your intention here?
order by votes*1000+((1440 - ($server_date - date))/60)2+visites600 desc
You may have poorly-written queries, and/or poorly written pages that run too many queries. Could you give us specific examples of queries you're using that are ran on a regular basis?
sure
this query to fetch the last 3 posts
select * from posts where visible = 1 and date > ($server_date - 86400) and dont_show_in_frontpage = 0 order by votes*1000+((1440 - ($server_date - date))/60)*2+visites*600 desc limit 3
what do you think?