Insert automatically on new table? - php

I will create 5 tables, namely data1, data2, data3, data4 and data5 tables. Each table can only store 1000 data records.
When a new entry or when I want to insert a new data, I must do a check,
$data1 = mysql_query(SELECT * FROM data1);
<?php
if(mysql_num_rows($data1) > 1000){
$data2 = mysql_query(SELECT * FROM data2);
if(mysql_num_rows($data2 > 1000){
and so on...
}
}
I think this is not the way right? I mean, if I am user 4500, it would take some time to do all the check. Is there any better way to solve this problem?

I haven decided the numbers, it can be 5000 or 10000 data. The reason is flexibility and portability? Well, one of my sql guru suggest me to do this way
Unless your guru was talking about something like Partitioning, I'd seriously doubt his advise. If your database can't handle more than 1000, 5000 or 10000 rows, look for another database. Unless you have a really specific example how a record limit will help you, it probably won't. With the amount of overhead it adds it probably only complicates things for no gain.
A properly set up database table can easily handle millions of records. Splitting it into separate tables will most likely increase neither flexibility nor portability. If you accumulate enough records to run into performance problems, congratulate yourself on a job well done and worry about it then.

Read up on how to count rows in mysql.
Depending on what database engine you are using, doing count(*) operations on InnoDB tables is quite expensive, and those counts should be performed by triggers and tracked in a adjacent information table.
The structure you describe is often designed around a mapping table first. One queries the mapping table to find the destination table associated with a primary key.

You can keep a "tracking" table to keep track of the current table between requests.
Also be on alert for race conditions (use transactions, or insure only one process is running at a time.)
Also don't $data1 = mysql_query(SELECT * FROM data1); with nested if's, do something like:
$i = 1;
do {
$rowCount = mysql_fetch_field(mysql_query("SELECT count(*) FROM data$i"));
$i++;
} while ($rowCount >= 1000);

I'd be surprised if MySQL doesn't have some fancy-pants way to manage this automatically (or at least, better than what I'm about to propose), but here's one way to do it.
1. Insert record into 'data'
2. Check the length of 'data'
3. If >= 1000,
- CREATE TABLE 'dataX' LIKE 'data';
(X will be the number of tables you have + 1)
- INSERT INTO 'dataX' SELECT * FROM 'data';
- TRUNCATE 'data';
This means you will always be inserting into the 'data' table, and 'data1', 'data2', 'data3', etc are your archived versions of that table.

You can create a MERGE table like this:
CREATE TABLE all_data ([col_definitions]) ENGINE=MERGE UNION=(data1,data2,data3,data4,data5);
Then you would be able to count the total rows with a query like SELECT COUNT(*) FROM all_data.

If you're using MySQL 5.1 or above, you can let the database handle this (nearly) automatically using partitioning:
Read this article or the official documentation

Related

What's faster, db calls or resorting an array?

In a site I maintain I have a need to query the same table (articles) twice, once for each category of article. AFAIT there are basically two ways of doing this (maybe someone can suggest a better, third way?):
Perform the db query twice, meaning the db server has to sort through the entire table twice. After each query, I iterate over the cursor to generate html for a list entry on the page.
Perform the query just once and pull out all the records, then sort them into two separate arrays. After this, I have to iterate over each array separately in order to generate the HTML.
So it's this:
$newsQuery = $mysqli->query("SELECT * FROM articles WHERE type='news' ");
while($newRow = $newsQuery->fetch_assoc()){
// generate article summary in html
}
// repeat for informational articles
vs this:
$query = $mysqli->query("SELECT * FROM articles ");
$news = Array();
$info = Array();
while($row = $query->fetch_assoc()){
if($row['type'] == "news"){
$news[] = $row;
}else{
$info[] = $row;
}
}
// iterate over each array separate to generate article summaries
The recordset is not very large, current <200 and will probably grow to 1000-2000. Is there a significant different in the times between the two approaches, and if so, which one is faster?
(I know this whole thing seems awfully inefficient, but it's a poorly coded site I inherited and have to take care of without a budget for refactoring the whole thing...)
I'm writing in PHP, no framework :( , on a MySql db.
Edit
I just realized I left out one major detail. On a given page in the site, we will display (and thus retrieve from the db) no more than 30 records at once - but here's the catch: 15 info articles, and 15 news articles. On each page we pull the next 15 of each kind.
You know you can sort in the DB right?
SELECT * FROM articles ORDER BY type
EDIT
Due to the change made to the question, I'm updating my answer to address the newly revealed requirement: 15 rows for 'news' and 15 rows for not-'news'.
The gist of the question is the same "which is faster... one query to two separate queries". The gist of the answer remains the same: each database roundtrip incurs overhead (extra time, especially over a network connection to a separate database server), so with all else being equal, reducing the number database roundtrips can improve performance.
The new requirement really doesn't impact that. What the newly revealed requirement really impacts is the actual query to return the specified resultset.
For example:
( SELECT n.*
FROM articles n
WHERE n.type='news'
LIMIT 15
)
UNION ALL
( SELECT o.*
FROM articles o
WHERE NOT (o.type<=>'news')
LIMIT 15
)
Running that statement as a single query is going to require fewer database resources, and be faster than running two separate statements, and retrieving two disparate resultsets.
We weren't provided any indication of what the other values for type can be, so the statement offered here simply addresses two general categories of rows: rows that have type='news', and all other rows that have some other value for type.
That query assumes that type allows for NULL values, and we want to return rows that have a NULL for type. If that's not the case, we can adjust the predicate to be just
WHERE o.type <> 'news'
Or, if there are specific values for type we're interested in, we can specify that in the predicate instead
WHERE o.type IN ('alert','info','weather')
If "paging" is a requirement... "next 15", the typical pattern we see applied, LIMIT 30,15 can be inefficient. But this question isn't asking about improving efficiency of "paging" queries, it's asking whether running a single statement or running two separate statements is faster.
And the answer to that question is still the same.
ORIGINAL ANSWER below
There's overhead for every database roundtrip. In terms of database performance, for small sets (like you describe) you're better off with a single database query.
The downside is that you're fetching all of those rows and materializing an array. (But, that looks like that's the approach you're using in either case.)
Given the choice between the two options you've shown, go with the single query. That's going to be faster.
As far as a different approach, it really depends on what you are doing with those arrays.
You could actually have the database return the rows in a specified sequence, using an ORDER BY clause.
To get all of the 'news' rows first, followed by everything that isn't 'news', you could
ORDER BY type<=>'news' DESC
That's MySQL short hand for the more ANSI standards compliant:
ORDER BY CASE WHEN t.type = 'news' THEN 1 ELSE 0 END DESC
Rather than fetch every single row and store it in an array, you could just fetch from the cursor as you output each row, e.g.
while($row = $query->fetch_assoc()) {
echo "<br>Title: " . htmlspecialchars($row['title']);
echo "<br>byline: " . htmlspecialchars($row['byline']);
echo "<hr>";
}
Best way of dealing with a situation like this is to test this for yourself. Doesn't matter how many records do you have at the moment. You can simulate whatever amount you'd like, that's never a problem. Also, 1000-2000 is really a small set of data.
I somewhat don't understand why you'd have to iterate over all the records twice. You should never retrieve all the records in a query either way, but only a small subset you need to be working with. In a typical site where you manage articles it's usually about 10 records per page MAX. No user will ever go through 2000 articles in a way you'd have to pull all the records at once. Utilize paging and smart querying.
// iterate over each array separate to generate article summaries
Not really what you mean by this, but something tells me this data should be stored in the database as well. I really hope you're not generating article excerpts on the fly for every page hit.
It all sounds to me more like a bad architecture design than anything else...
PS: I believe sorting/ordering/filtering of a database data should be done on the database server, not in the application itself. You may save some traffic by doing a single query, but it won't help much if you transfer too much data at once, that you won't be using anyway.

how to make mysql queries in php run quicker

i am running queries on a table that has thousands of rows:
$sql="select * from call_history where extension_number = '0536*002' and flow = 'in' and DATE(initiated) = '".date("Y-m-d")."' ";
and its taking forever to return results.
The SQL itself is
select *
from call_history
where extension_number = '0536*002'
and flow = 'in'
and DATE(initiated) = 'dateFromYourPHPcode'
is there any way to make it run faster? should i put the where DATE(initiated) = '".date("Y-m-d")."' before the extension_number where clause?
or should i select all rows where DATE(initiated) = '".date("Y-m-d")."' and put that in a while loop then run all my other queries (where extension_number = ...) whthin the while loop?
Here are some suggestions:
1) Replace SELECT * by the only fields you want.
2) Add indexing on the table fields you want as output.
3) Avoid running queries in loops. This causes multiple requests to SQL server.
4) Fetch all the data at once.
5) Apply LIMIT tag as and when required. Don't select all the records.
6) Fire two different queries: one for counting total number of records and other for fetching number of records per page (e.g. 10, 20, 50, etc...)
7) If applicable, create Database Views and get data from them instead of tables.
Thanks
The order of clauses under WHERE is irrelevant to optimization.
Pro-tip, also suggested by somebody else: Never use SELECT * in a query in a program
unless you have a good reason to do so. "I don't feel like writing out the names of the columns I need" isn't a good reason. Always enumerate the columns you need. MySQL and other database systems can often optimize things in surprising ways when the list of data columns you need is available.
Your query contains this selection criterion.
AND DATE(initiated) = 'dateFromYourPHPcode'
Notice that this search criterion takes the form
FUNCTION(column) = value
This form of search defeats the use of any index on that column. Your initiated column has a TIMESTAMP data type. Try this instead:
AND initiated >= 'dateFromYourPHPcode'
AND initiated < 'dateFromYourPHPcode' + INTERVAL 1 DAY
This will find all the initiated items in the particular day. And, because it doesn't use a function on the column value it can use an index range scan to do that, which performs well. It may, or may not, also help without an index. It's worth a try.
I suspect your ideal index for this particular search would created by
ALTER TABLE call_history
ADD INDEX flowExtInit (flow, extension_number, initiated)
You should ask the administrator of the database to add this index if your query needs good performance.
You should add index to your table. This way MySql will fetch faster. I have not tested but command should be like this:
ALTER TABLE `call_history ` ADD INDEX `callhistory` (`extension_number`,`flow`,`extension_number`,`DATE(initiated)`);

PHP/MySQL: Massive SQL query or several smaller queries?

I have a database design here that looks this in simplified version:
Table building:
id
attribute1
attribute2
Data in there is like:
(1, 1, 1)
(2, 1, 2)
(3, 5, 4)
And the tables, attribute1_values and attribute2_values, structured as:
id
value
Which contains information like:
(1, "Textual description of option 1")
(2, "Textual description of option 2")
...
(6, "Textual description of option 6")
I am unsure whether this is the best setup or not, but it is done as such per requirements of my project manager. It definitely has some truth in it as you can modify the text easily now without messing op the id's.
However now I have come to a page where I need to list the attributes, so how do I go about there? I see two major options:
1) Make one big query which gathers all values from building and at the same time picks the correct textual representation from the attribute{x}_values table.
2) Make a small query that gathers all values from the building table. Then after that get the textual representation of each attribute one at a time.
What is the best option to pick? Is option 1 even faster as option 2 at all? If so, is it worth the extra trouble concerning maintenance?
Another suggestion would be to create a view on the server with only the data you need and query from that. That would keep the work on the server end, and you can pull just what you need each time.
If you have a small number of rows in attributes table, then I suggest to fetch them first, fetch all of them! store them into some array using id as index key in array.
Then you can proceed with building data, now you just have to use respective array to look for attribute value
I would recommend something in-between. Parse the result from the first table in php, and figure out how many attributes you need to select from each attribute[x]_values table.
You can then select attributes in bulk using one query per table, rather than one query per attribute, or one query per building.
Here is a PHP solution:
$query = "SELECT * FROM building";
$result = mysqli_query(connection,$query);
$query = "SELECT * FROM attribute1_values";
$result2 = mysqli_query(connection,$query);
$query = "SELECT * FROM attribute2_values";
$result3 = mysqli_query(connection,$query);
$n = mysqli_num_rows($result);
for($i = 1; $n <= $i; $i++) {
$row = mysqli_fetch_array($result);
mysqli_data_seek($result2,$row['attribute1']-1);
$row2 = mysqli_fetch_array($result2);
$row2['value'] //Use this as the value for attribute one of this object.
mysqli_data_seek($result3,$row['attribute2']-1);
$row3 = mysqli_fetch_array($result3);
$row3['value'] //Use this as the value for attribute one of this object.
}
Keep in mind that this solution requires that the tables attribute1_values and attribute2_values start at 1 and increase by 1 every single row.
Oracle / Postgres / MySql DBA here:
Running a query many times has quite a bit of overhead. There are multiple round trips to the db, and if it's on a remote server, this can add up. The DB will likely have to parse the same query multiple times in MySql which will be terribly inefficient if there are tons of rows. Now, one thing that your PHP method (multiple queries) has as an advantage is that it'll use less memory as it'll release the results as they're no longer needed (if you run the query as a nested loop that is, but if you query all the results up front, you'll have a lot of memory overhead, depending on the table sizes).
The optimal result would be to run it as 1 query, and fetch the results 1 at a time, displaying each one as needed and discarding it, which can reek havoc with MVC frameworks unless you're either comfortable running model code in your view, or run small view fragments.
Your question is very generic and i think that to get an answer you should give more hints to how this page will look like and how big the dataset is.
You will get all the buildings with theyr attributes or just one at time?
Cause your data structure look like very simple and anything more than a raspberrypi can handle it very good.
If you need one record at time you don't need any special technique, just JOIN the tables.
If you need to list all buildings and you want to save db time you have to measure your data.
If you have more attribute than buildings you have to choose one way, if you have 8 attributes and 2000 buildings you can think of caching attributes in an array with a select for each table and then just print them using the array. I don't think you will see any speed drop or improvement with so simple tables on a modern computer.
$att1[1]='description1'
$att1[2]='description2'
....
Never do one at a time queries, try to combine them into a single one.
MySQL will cache your query and it will run much faster. PhP loops are faster than doing many requests to the database.
The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
http://dev.mysql.com/doc/refman/5.1/en/query-cache.html

Compare table rows, big data amount

I have a quite interesting task. But I don't know how to call it in one word in order to search for related topics. Even this topic title might not reflect what I need. So, if somebody has better title - welcome.
I'll try to explain my problem.
I have about 100,000 rows in MySQL db table. And I need to "compare" entries from the table.
"compare" doesn't mean just equal. There is an algorithm for calculation comparison level. I have weight coefficient for each table column. Means that if entry#1's column1 equals to entry#2's column2 then I give, say, 5 point to this pair. And so on for each column.
The most straight forward way to do this - apply calculation rules for each couple of entries. Why am I afraid of this? 100,000 entries means about 5 billion "compare" operations. For sure, I can calculate this on demand and store the result somewhere in cache. But I believe that the most obvious way is not the most effective.
So, my first question is: Is there any other better way to achive my goal except of brute force?
My second question is related to tool which is better for calculations.
Application language is PHP. Hence, I need to load into memory whole
table and iterate over the data.
Create stored procedure in MySQL.
Using MongoDB's aggregation framework or MapReduce.
The least of all I like the first way. The most of all - the last.
I'm looking for any suggestion or advice from people who have experience in such sort of cases.
Since, I don't know how to ask google for help, any links will be appreciated.
UPDATE:
Calculation rules are a bit more complicated then I described...
Table has a set of related columns which are to be used at once as group(not one by one).
Let's assume:
table has fields, say, tag_1, tag_2, .., tag_n.
row_1 and row_2 - entries in the table.
The rule(pseudo-code):
if(row_1.tag_1==row_2.tag_1)
{
// gives 10 points
}
elseif(row_1.tag_1 is in row_2.tags && row_1.tag_1!=row_2.tag_1)
{
// gives 5 points
}
....
// and so on
Basically, I need to check find intersection of two arrays. If it is not empty - points are given. If indexes of tags in two rows match the additional points are given.
I'm wondering, how this can be accomplished using Stored Procedures Language? Because it can be done pretty easy using any programming language.
If stored procedure can do this then it is my choice.
If you have a static table, then it doesn't make a difference which you choose, so long as you store the results somewhere (presumably back in the database).
If your data is changing, then you need to compare each new row to all rows, which is essentially a full-table scan. This is probably best done in a database.
If the data fits into memory (and 500,000 rows should fit into memory), then (2) will probably be faster than (3) on equivalent hardware. "Equivalent hardware" is a very important consideration.
In most cases, I would opt for (2). It sounds like the query is something like:
select t.id, t2.id,
((case when t1.col1 = t2.col1 then 5 else 0 end) +
(case when t2.col2 = t2.col2 then 7 else 0 end) +
. . .
)
from t cross join t2
If you are much more comfortable with map-reduce, then you might find it easier to code there. I know both languages and prefer SQL for something like this.
Can't you do something like this:
UPDATE table SET points = points+5 WHERE column1 = column2
If you have too check for a specific value, you could try something like this:
UPDATE table SET points = points+5 WHERE column1 = 'somevalue' AND column2 = 'somevalue'

php and MySQL: 2 requests or 1 request?

I'm building a wepage in php using MySQL as my database.
Which way is faster?
2 requests to MySQL with the folling query.
SELECT points FROM data;
SELECT sum(points) FROM data;
1 request to MySQL. Hold the result in a temporary array and calcuale the sum in php.
$data = SELECT points FROM data;
EDIT -- the data is about 200-500 rows
It's really going to depend on a lot of different factors. I would recommend trying both methods and seeing which one is faster.
Since Phill and Kibbee have answered this pretty effectively, I'd like to point out that premature optimization is a Bad Thing (TM). Write what's simplest for you and profile, profile, profile.
How much data are we talking about? I'd say MySQL is probably faster at doing those kind of operations in the majority of cases.
Edit: with the kind of data that you're talking about, it probably won't make masses of difference. But databases tend to be optimised for those kind of queries, whereas PHP isn't. I think the second DB query is probably worth it.
If you want to do it in one line, use a running total like this:
SET #total=0;
SELECT points, #total:=#total+points AS RunningTotal FROM data;
I wouldn't worry about it until I had an issue with performance.
If you go with two separate queries, you need to watch out for the possibility of the data changing between getting the rows & getting their sum. Until there's an observable performance problem, I'd stick to doing my own summation to keep the page consistent.
The general rule of thumb for efficiency with mySQL is to try to minimize the number of SQL requests. Every call to the database adds overhead and is "expensive" in terms of time required.
The optimization done by mySQL is quite good. It can take very complex requests with many joins, nestings and computations, and make it run efficiently.
But it can only optimize individual requests. It cannot check the relationship between two different SQL statements and optimize between them.
In your example 1, the two statements will make two requests to the database and the table will be scanned twice.
Your example 2 where you save the result and compute the sum yourself would be faster than 1. This would only be one database call, and looping through the data in PHP to get the sum is faster than a second call to the database.
Just for the fun of it.
SELECT COUNT(points) FROM `data`
UNION
SELECT points FROM `data`
The first row will be the total, the next rows will be the data.
NOTE: Union can be slow, but its an option.
Could also do more fun and this supports you sorting the rows.
SELECT 'total' AS name, COUNT(points) FROM `data`
UNION
SELECT 'points' AS name, points FROM `data`
Then selecting through PHP
while($row = mysql_fetch_assoc($query))
{
if($row["data"] == "points")
{
echo $row["points"];
}
if($row["data"] == "total")
{
echo "Total is: ".$row["points"];
}
}
You can use union like this:
(select points, null as total from data) union (select null, sum(points) from data group by points);
The result will look something like this:
point total
2 null
5 null
...
null 7
you can figure out how to handle it.
do it the mySQL way. let the database manager do its work.
mySQL is optimized for such tasks

Categories