What is the best way to handle a large amount of data entries on a web page?
Let's assume I am having a database with 5000 records on a table that contain song_name,author_name,song_id,posted_by; I want to build a playlist with all the songs on a single page. Also on that page there is a player that plays songs according to the playlist entries that is shown on the page.
I have tried to pull all 5000 entries from that table and build a javascript object with them, and handling that object I have built the playlist, search in playlist, and so forth. But that takes a very large amount of resources ( un the user end ) and a lot of page loading time ( because there are a lot of entries! ) and the page is very slow.
Is it better to load all the data into an object and paginate by JavaScript each 100 records of the playlist or is it better to get the results paginated from the database and just update the playlist? ( This taking in consideration the fact that I if the player has the shuffle button activated, it may shuffle to ANY song in the user's database, not only on the current songs from the visible playlist )
I think pagination is your best option. Just create a limit of 100 (for example) and use AJAX to extract the next 100. If the client turns on shuffle, just send another request to the server and let it call a function that does the following:
Count total rows in database
Use a randomize function to get 100 random numbers
Now create a slightly tricky query to get records from the db based
on their rownumber:
function getRandomTracks($limit) {
$total = $this->db->count_all('table_tracks');
//Get random values. Speed optimization by predetermine random rownumbers using php
$arr = array();
while (count($arr) < $limit) {
$x = mt_rand(0, $total); //get random value between limit and 0
if (!isset($arr[$x])) { //Random value must be unique
//using random value as key and check using isset is faster then in_array
$arr[$x] = true;
}
}
//Create IN string
$in = implode(',', array_keys($arr));
//Selection based on random rownumbers
$query = $this->db->query('SELECT * FROM
(SELECT #row := #row + 1 as row, t.*
FROM `table_tracks` t, (SELECT #row := 0) r) AS tracks
WHERE `row` IN(' . $in . ')');
return $query->result();
}
I'm using a similar function, also to deal will large amounts of tracks (over 300.000) so I'm sure this will work!
It is very hard to load the "entire" data to client program even if you are using jQuery or other library else, as the key factor is not what code/sdk you are using but the browser itself!
By the way, chrome is the most fast and IE(before ver.10) is the lowest.
You can refer the links below:
http://www.infoq.com/news/2010/09/IE-Subsystems-Spends-Time
http://www.nczonline.net/blog/2009/01/05/what-determines-that-a-script-is-long-running/
http://www.zdnet.com/browser-benchmarks-ie-firefox-opera-and-safari-3039420732/
http://msdn.microsoft.com/en-us/library/Bb250448
http://support.microsoft.com/kb/175500/en-us
So what you should do is to move your client logic to the server-side just as other people suggesting.
As you mentioned to get paginated but with just javascript for all your data, it is the same as none paginate in essence.
use ajax to load the data in steps of 100 (or more, just try)
do a loop over your recordsets and increase the limit each time:
<?php
$Step = 100;
$u_limit = 0;
$sql = "SELECT song_id FROM $MySQL_DB.songs";
$data = mysql_query($sql, $dblk);
$numer_of_entries = mysql_num_rows($data);
while($o_limit < $numnumer_of_entries)
{
$o_limit = u_limit + $Step;
$sql = "SELECT * FROM $MySQL_DB.songs order by id DESC LIMIT $u_limit, $o_limit";
$data = mysql_query($sql, $dblk);
while($row = mysql_fetch_array($data))
{
// built an array and return this to ajax
}
$u_limit += $Step;
}
Try this: http://www.datatables.net/
I wonder but maybe it's works.
Related
I'm wondering whether this kind of logic would improve query performance, say for example rather then checking a user likes a post on each element in an array and firing a query for each.
Instead i could push the primary id's into an array and then perform an IN query on them, this would reduce 15 nth term queries, and batch it into 2 query including the initial one.
I'm using PHP PDO, MYSQL.
Any advice? Am i on the right track people? :D
$items is the result set from the database, in this case they are questions that users are asking, i get a response in about 140ms and i've set a limit on how many items are loaded at once with pagination.
$questionIds = [];
foreach ($items as $item) {
array_push($questionIds, $item->question_id);
}
$items = loggedInUserLikesQuestions($questionIds, $items, $user_id);
Definitely the IN clause is faster on execution of the SQL query. However, you will only see significant actual clock-speed benefits once the number of items in your IN clause (on average) gets high.
The reason there is a speed difference, even though the individual update may be lightning-fast, is the setup, executing, tear-down, and response of each query, send/receive to the server. When you are doing thousands (or millions) of these as fast as you can, I've seen, instead of 500/sec, getting 200,000/sec. This may give you some idea.
However, with the IN-clause method, you need to make sure your IN clause does not become too big, and hitting the max query size (see variable max_allowed_packet)
Here is a simple set of functions that will automatically batch up into IN clauses of 1000 items each:
<?php
$db = new PDO('...');
$__q = [];
$flushQueue = function() use ($db, &$__q) {
if ( count($__q) > 0 ) {
$sanitized_ids = [];
foreach ( $__q as $id ) { $sanitized_ids[] = (int) $id; }
$db->query("UPDATE question SET linked = 1 WHERE id IN (". join(',',$sanitized_ids) .")");
$__q = [];
}
};
$queuedUpdate = function($question_id) use (&$__q, $flushQueue){
$__q[] = $question_id;
if ( count( $__q) > 1000 ) { $flushQueue(); }
};
// Then your code...
foreach ($items as $item) {
$queuedUpdate($item->question_id);
}
$flushQueue();
Obviously, you don't have to use anon functions, if you are in a class. But the above will work anywhere (assuming you are on >= PHP 5.3).
I seriously need help with GDS and PHP-GDS Library.
I am doing a fetch from outside google appengine using the fantastic php-gds library. So far the library works fine.
My problem is that my data fetch from GDS returns inconsistent results and I have no idea what the issue might be.
Please see code below:
<?php
//...
//...
$offset = 0;
do{
$query = "SELECT * FROM `KIND` order by _URI ASC limit 300 offset ".$offset;
$tableList=[];
$tableList = $this->obj_store->fetchAll($query);
$offset += count($tableList);
$allTables[] = $tableList;
$totalRecords = $offset;
}while(!empty($tableList));
echo $totalRecords; // i expect total records to be equal to the number of entities in the KIND.
// but the results are inconsistent. In some cases, it is correct
// but in most cases it is far less than the total records.
// I could have a KIND with 750 entities and only 721 will be returned in total.
// I could have a KIND with 900 entities and all entities will be returned.
// I could have a KIND with 4000 entities and only 1200 will be returned.
?>
Please help. Also when I run the exact same query in the cloud console I get the right entity count. (Hope this helps someone)
UPDATE
I ended up using cursors. New code below:
<?php
$query = "SELECT * FROM `KIND`";
$tableList=[];
$queryInit = $this->obj_store->query($query);
do{
$page = $this->obj_store->fetchPage(300);// fetch page.
$tableList = am($tableList,$page); //merge with existing records.
$this->obj_store->setCursor($this->obj_store->getCursor());//set next cursor to previous last cursor
}while(!empty($page)); //as long as page result is not empty.
?>
Try using a cursor instead of an offset. See the discussion of cursors (including samples in PHP) here:
https://cloud.google.com/datastore/docs/concepts/queries#datastore-cursor-paging-php
I am creating a project which involves getting some questions from mysql database. For instance, if I have 200 questions in my database, I want to randomly choose 20 questions in such a way that no one question will be repeated twice. That is, I want to be able to have an array of 20 different questions from the 200 I have every time the user tries to get the list of questions to answer. I will really appreciate your help.
SELECT * FROM questions ORDER BY RAND() LIMIT 20;
PS^ This method not possible for very big tables
Use Google to find a function to create an array with 20 unique numbers, with a minimum and a maximum. Use this array to prepare an SQL query such as:
expression IN (value1, value2, .... value_n);
More on the SQL here.
Possible array filling function here too.
Assuming you have contiguously number questions in your database, you just need a list of 20 random numbers. Also assuming you want the user to be able to take more than one test and get another 20 questions without duplicates then you could start with a randomised array of 200 numbers and select blocks of 20 sequentially from that set i.e.
$startQuestion=1;
$maxQuestion=200;
$numberlist= range(1,$maxQuestion);
shuffle($numberlist);
function getQuestionSet( $noOfQuestions )
{
global $numberlist, $maxQuestion, $startQuestion;
$start= $startQuestion;
if( ($startQuestion+$noOfQuestions) > $maxQuestion)
{
echo "shuffle...\n";
shuffle($numberlist);
$startQuestion=1;
}else
$startQuestion+= $noOfQuestions;
return array_slice($numberlist,$start,$noOfQuestions);
}
// debug...
for($i=0; $i<42; $i++)
{
$questionset= getQuestionSet( 20 );
foreach( $questionset as $num )
echo $num." ";
echo "\n";
}
then use $questionset to retrieve your questions
If you know how many rows there are in the table, you could do use LIMIT to your advantage. With limit you specify a random offset; syntax: LIMIT offset,count. Example:
<?php
$totalRows = 200; // get his value dynamically or whatever...
$limit = 2; // num of rows to select
$rand = mt_rand(0,$totalRows-$limit-1);
$query = 'SELECT * FROM `table` LIMIT '.$rand.','.$limit;
// execute query
?>
This should be safe for big tables, however it will select adjacent rows. You could then mix up the result set via array_rand or shuffle:
<?php
// ... continued
$resultSet = $pdoStm->fetchAll();
$randResultKeys = array_rand($resultSet,$limit); // using array_rand
shuffle($resultSet); // or using shuffle
?>
I have a bunch of photos on a page and using jQuery UI's Sortable plugin, to allow for them to be reordered.
When my sortable function fires, it writes a new order sequence:
1030:0,1031:1,1032:2,1040:3,1033:4
Each item of the comma delimited string, consists of the photo ID and the order position, separated by a colon. When the user has completely finished their reordering, I'm posting this order sequence to a PHP page via AJAX, to store the changes in the database. Here's where I get into trouble.
I have no problem getting my script to work, but I'm pretty sure it's the incorrect way to achieve what I want, and will suffer hugely in performance and resources - I'm hoping somebody could advise me as to what would be the best approach.
This is my PHP script that deals with the sequence:
if ($sorted_order) {
$exploded_order = explode(',',$sorted_order);
foreach ($exploded_order as $order_part) {
$exploded_part = explode(':',$order_part);
$part_count = 0;
foreach ($exploded_part as $part) {
$part_count++;
if ($part_count == 1) {
$photo_id = $part;
} elseif ($part_count == 2) {
$order = $part;
}
$SQL = "UPDATE article_photos ";
$SQL .= "SET order_pos = :order_pos ";
$SQL .= "WHERE photo_id = :photo_id;";
... rest of PDO stuff ...
}
}
}
My concerns arise from the nested foreach functions and also running so many database updates. If a given sequence contained 150 items, would this script cry for help? If it will, how could I improve it?
** This is for an admin page, so it won't be heavily abused **
you can use one update, with some cleaver code like so:
create the array $data['order'] in the loop then:
$q = "UPDATE article_photos SET order_pos = (CASE photo_id ";
foreach($data['order'] as $sort => $id){
$q .= " WHEN {$id} THEN {$sort}";
}
$q .= " END ) WHERE photo_id IN (".implode(",",$data['order']).")";
a little clearer perhaps
UPDATE article_photos SET order_pos = (CASE photo_id
WHEN id = 1 THEN 999
WHEN id = 2 THEN 1000
WHEN id = 3 THEN 1001
END)
WHERE photo_id IN (1,2,3)
i use this approach for exactly what your doing, updating sort orders
No need for the second foreach: you know it's going to be two parts if your data passes validation (I'm assuming you validated this. If not: you should =) so just do:
if (count($exploded_part) == 2) {
$id = $exploded_part[0];
$seq = $exploded_part[1];
/* rest of code */
} else {
/* error - data does not conform despite validation */
}
As for update hammering: do your DB updates in a transaction. Your db will queue the ops, but not commit them to the main DB until you commit the transaction, at which point it'll happily do the update "for real" at lightning speed.
I suggest making your script even simplier and changing names of the variables, so the code would be way more readable.
$parts = explode(',',$sorted_order);
foreach ($parts as $part) {
list($id, $position) = explode(':',$order_part);
//Now you can work with $id and $position ;
}
More info about list: http://php.net/manual/en/function.list.php
Also, about performance and your data structure:
The way you store your data is not perfect. But that way you will not suffer any performance issues, that way you need to send less data, less overhead overall.
However the drawback of your data structure is that most probably you will be unable to establish relationships between tables and make joins or alter table structure in a correct way.
I came across a question today of search efficiency for large sets today and I've done by best to boil it down to the most basic case. I feel like this sort of thing probably relates to some classic problem or basic concept I'm missing, so a pointer to that would be great.
Suppose I have a table definition like
CREATE TABLE foo(
id int,
type bool,
reference int,
PRIMARY KEY(id),
FOREIGN KEY(reference) REFERENCES foo(id),
UNIQUE KEY(reference)
) Engine=InnoDB;
Populated with n rows where n/2 are randomly assigned type=1. Each row references another with its same type except for the first, which has reference=null.
Now we want to print all items with type 1. I assume that at some point, it will be faster to recursively call something like
function printFoo1($ref){
if($ref==null)
return;
$q = 'SELECT id, reference FROM foo WHERE id='.$ref;
$arr = mysql_fetch_array( mysql_query($q) );
echo $arr[0];
printFoo1($arr[1]);
}
As opposed to
function printFoo2($ref){
$q = 'SELECT id FROM foo WHERE type=1';
$res = mysql_query($q);
while( $id = mysql_fetch_array($res) ){
echo $id[0];
}
}
The main point here being that function 1 searches for "id", which is indexed, whereas function 2 has to make n/2 comparisons that don't result in a hit, but that the overhead of multiple queries is going to be significantly greater than the single SELECT.
Is my assumption correct? If so, how large of a data set would we need before function 1 outperforms function 2?
Your example is a bit difficult to parse, but ill start at the top:
Your first function does not return all of the elements with type = 1. It returns all of the elements that are dependent (based on references) to the element you pass in. From the PHP standpoint, since the link/handle is already open there is a non-trivial overhead from your function call with each successive request, not to mention the string concatenation incurring a cost with each execution of that line.
Typically it is better to use the second function styling because it only queries the database one time and will return the elements you are requesting without further work. It will come down to a profiler of course, to determine which is going to return faster, but from my tests the second is hands down the better choice:
This was executed with n = 5000 elements in the db (n/2 = 2500 type 1 and passing in reference = highest id with type = 1 from query of db).
printFoo1: 3.591840 seconds
printFoo2: 0.010340 seconds
It wouldn't really make sense for it to work any other way. If you were able to do what you propose that would make JOIN calls have to perform less efficient as well.
Code
$res = mysql_query('SELECT MAX( id ) as `MAX_ID` FROM `foo` WHERE `type` = 1', $link);
$res2 = mysql_fetch_assoc($res);
$id = $res2['MAX_ID'];
// cleanup result and free resources here
echo "printFoo1: ";
$start = microtime(true);
printFoo1($id);
echo microtime(true) - $start;
echo '<br />';
echo "printFoo2: ";
$start = microtime(true);
printFoo2();
echo microtime(true) - $start;
mysql_close($link);
All of this was tested on PHP 5.2.17 running on Linux