I have a website page that, on load, fires 10 different queries against a table with 150,000,000 rows.
Normally the page loads in under 2 seconds - but if I refresh too often, it creates a lot of queries, which slow page load time by up to 10 seconds.
How can I avoid firing all of those queries, since it would kill my database?
I have no caching yet. The site works in the following way. I have a table were all URIs are stored. If a user enters a URL I grap the URI out of the called URL and check in the table if the URI is stored. In case the URI is stored in the table I pull the corresponding data from the other tables in a relational database.
An example code from one of the PHP files that pulls the information from the other tables is this
<?php
set_time_limit(2);
define('MODX_CORE_PATH', '/path/to/modx/core/');
define('MODX_CONFIG_KEY','config');
require_once MODX_CORE_PATH . 'model/modx/modx.class.php';
// Criteria for foreign Database
$host = 'hostname';
$username = 'user';
$password = 'password';
$dbname = 'database';
$port = 3306;
$charset = 'utf8mb4';
$dsn = "mysql:host=$host;dbname=$dbname;port=$port;charset=$charset";
$xpdo = new xPDO($dsn, $username, $password);
// Catch the URI that is called
$pageURI = $_SERVER["REQUEST_URI"];
// Get the language token saved as TV "area" in parent and remove it
if (!isset($modx)) return '';
$top = isset($top) && intval($top) ? $top : 0;
$id= isset($id) && intval($id) ? intval($id) : $modx->resource->get('id');
$topLevel= isset($topLevel) && intval($topLevel) ? intval($topLevel) : 0;
if ($id && $id != $top) {
$pid = $id;
$pids = $modx->getParentIds($id);
if (!$topLevel || count($pids) >= $topLevel) {
while ($parentIds= $modx->getParentIds($id, 1)) {
$pid = array_pop($parentIds);
if ($pid == $top) {
break;
}
$id = $pid;
$parentIds = $modx->getParentIds($id);
if ($topLevel && count($parentIds) < $topLevel) {
break;
}
}
}
}
$parentid = $modx->getObject('modResource', $id);
$area = "/".$parentid->getTVValue('area');
$URL = str_replace($area, '', $pageURI);
$lang= $parentid->getTVValue('lang');
// Issue queries against the foreign database:
$output = '';
$sql = "SELECT epf_application_detail.description FROM epf_application_detail INNER JOIN app_uri ON epf_application_detail.application_id=app_uri.application_id WHERE app_uri.uri = '$URL' AND epf_application_detail.language_code = '$lang'";
foreach ($xpdo->query($sql) as $row) {
$output .= nl2br($row['description']);
}
return $output;
Without knowing what language you're using, here is some pseudo-code to get you started.
Instead of firing a large query every time your page loads, you could create a separate table called something like "cache". You could run your query and then store the data from the query in that cache table. Then when your page loads, you can query the cache table, which will be much smaller, and won't bog things down when you refresh the page a lot.
Pseudo-Code (which can be done on an interval using a cronjob or something, to keep your cache fresh.):
Run your ten large queries
For each query, add the results to cache like so:
query_id | query_data
----------------------------------------------------
1 | {whatever your query data looks like}
Then, when your page loads, have each query collect the data from cache
It is important to note, that with a cache table, you will need to refresh it often. (either as often as you get more data, or on a set interval, like every 5 minutes or something.)
You should do some serverside caching and optimizations.
If I was you I would install Memcached for your database.
I would also consider some static caching like Varnish, this will cache every page as static HTML, With varnish the 2nd (and 3th, 4th,...) request doesn't have to be handled by PHP and MySQL, which will make it load a lot faster when you load it for the 2nd time (within the cache lifetime ofcourse).
Last of all you can help the PHP side handle the data better by installing APC (or an other opcode cache).
Related
I got that the only solution to avoid the Maximum time execution CodeIgniter 3 issue is to increase the time execution from 30 to 300 for example.
I'm using CodeIgniter in a news website. I'm loading only 20 latest news in the news section page and I think that it's not a big number to make the server out of execution time. (Notice that the news table has more than 1400 news and the seen table has more than 150.000 logs).
I say that it's not logical that the user should wait for more than 50 seconds to get the respond and load the page.## Heading ##
Is there any useful way to load the page as fast as possible without "maximum time execution"?
My Code in the model:
public function get_section_news($id_section = 0, $length = 0, $id_sub_section = 0, $id_news_lessthan = 0) {
$arr = [] or array();
//
if (intval($id_section) > 0 and intval($length) > 0) {
//
$where = [] or array();
$where['sections.activity'] = 1;
$where['news.deleted'] = 0;
$where['news.id_section'] = $id_section;
$query = $this->db;
$query
->from("news")
->join("sections", "news.id_section = sections.id_section", "inner")
->order_by("news.id_news", "desc")
->limit($length);
//
if (intval($id_sub_section) > 0) {
$where['news.id_section_sub'] = $id_sub_section;
}
if ($id_news_lessthan > 0) {
$where['news.id_news <'] = $id_news_lessthan;
}
//
$get = $query->where($where)->get();
$num = $get->num_rows();
if ($num > 0) {
//
foreach ($get->result() as $key => $value) {
$arr['row'][] = $value;
}
}
$arr['is_there_more'] = ($length > $num and $num > 0) ? true : false;
}
return $arr;
}
This usually has nothing to do with the framework. You may run the following command on your mysql client and check if there are any sleeping queries on your database.
SHOW FULL PROCESSLIST
most likely you have sleeping queries since you are not emptying result set with
$get->free_result();
Another problem may be slow queries on this I recommend the following
1) make sure you are using the same database engine on all tables for this I recommend INNODB as some engines lock the whole table during a transaction which is undesirable You should have noticed this already when you ran show full processlist
2) Run your queries on a mysql client and observe how long they will take to execute. If they take too long it may be a result of unindexed tables. You may Explain your query to identify unindexed tables. You may follow these 1,2,3 tutorials on indexing your tables. Or you can do it easily with tools like navicat
I installed fork on my Ubuntu Server (using PHP-Apache-Codeigniter), and checked if it's working using var_dump (extension_loaded('pcntl')); and got a "true" output (How to check PCNTL module exists).
I have this code:
public function add_keyword() {
$keyword_p = $this->input->post('key_word');
$prod = $this->input->post('prod_name');
$prod = $this->kas_model->search_prod_name($prod);
$prod = $prod[0]->prod_id;
$country = $this->input->post('key_country');
$keyword = explode(", ", $keyword_p);
var_dump($keyword);
$keyword_count = count($keyword);
echo "the keyword count: $keyword_count";
// Create fork
$pid = pcntl_fork();
if(!$pid){
for ($i=0; $i < $keyword_count ; $i++) {
// Inserts the inputs to the "keywords" table
$this->kas_model->insert_keyword($keyword[$i], $prod, $country);
// Gets relevant IDs for the inserted prod and keyword
$last_inserted_key = $this->kas_model->get_last_rec('keywords');
$keyword_id = $last_inserted_key[0]->key_id;
$prod_id = $last_inserted_key[0]->key_prod;
$prod_id_query = $this->kas_model->get_prod_row_by_id($prod_id);
$prod_id_a = $prod_id_query[0]->prod_a_id;
$prod_id_b = $prod_id_query[0]->prod_b_id;
// Run the keyword query (on API) for today on each one of the keys and insert to DB aslong that the ID isn't 0.
if ( ($prod_id_a != 0) || ( !empty($prod_id_a) ) ) {
$a_tdr = $this->get_var1_a_by_id_and_kw( $prod_id_a, $keyword[$i], $country);
} else {
$a_tdr['var1'] = 0;
$a_tdr['var2'] = 0;
$a_tdr['var3'] = 0;
}
if ( ($prod_id_b != 0) || ( !empty($prod_id_b) ) ) {
$b_tdr = $this->get_var1_b_by_id_and_kw($prod_id_b, $keyword[$i], $country);
} else {
$b_tdr['var1'] = 0;
$b_tdr['var2'] = 0;
$b_tdr['var3'] = 0;
}
$this->kas_model->insert_new_key_to_db($keyword_id, $a_tdr['var1'], $b_tdr['var1'], $a_tdr['var2'], $b_tdr['var2'], $a_tdr['var3'], $b_tdr['var3']);
}
exit($i);
}
// we are the parent (main), check child's (optional)
while(pcntl_waitpid(0, $status) != -1){
$status = pcntl_wexitstatus($status);
// echo "Child $status completed\n";
redirect('main/kas');
}
redirect('main/kas');
}
What the function does?
This function gets 1 or more keyword/s, a country var, and a product ID, and runs a query on an external slow API getting variables (runs other functions from within that same controller), and adds them to the database.
Problem: When running this function, and if I insert a lot of keywords, the page loads, and loads, and loads, for a long time, until it's done, and only then - I can continue browsing my website. So I was told to fork it since it's just sending a request to process it in the background, so whenever clicking the submit button, I get redirected to "main/kas".
Currently: I don't get redirected but the function runs without any errors.
I was told that it suppose to work but it's not - so I'm guessing I am doing something wrong within the code (?), or something else isn't working from within the server (???). This is my first time working with fork so I don't know a lot of how to operate with in (in syntax or from within the server).
Can you please help me debug the problem?
http://www.electrictoolbox.com/mysql-connection-php-fork/
Reason for the error The parent and child processes all share the same
database connection. When the first child process exits it will
disconnect from the database, which means the same connection all
processes are using will be disconnected, causing any further queries
to fail.
The solution The solution is to disconnect from the database before
forking the sub processes and then establish a new connection in each
process. The fourth parameter also should be passed to the
mysql_connect function as "true" to ensure a new link is established;
the default is to share an existing connection is the login details
are the same.
The question is!
Is that efficient to connect to the server in the child and if there are any other alternative ways to do this better.
I'm a beginner to OOP. The following is the jist of my code, which I am trying to find a proper design pattern for:
class data {
public $location = array();
public $restaurant = array();
}
$data = new data;
$query = mysqli_query($mysqli, "SELECT * FROM restaurants");
//actually a big long query, simplified for illustrative purposes here
$i = 0;
while ($row = mysqli_fetch_array($query)) {
$data->location[] = $i.')'.$row['location']."<br>";
$data->restaurant[] = $row['restaurant']."<br>";
$i++;
}
I'd like to access the data class from another PHP page. (To print out information in HTML, hence the tags). I prefer not to run the query twice. I understand that classes are created and destroyed in a single PHP page load. I'd appreciated design pattern guidance for managing HTTP application state and minimizing computer resources in such a situation.
If you store the data object in a $_SESSION variable, you will have access to it from other pages and upon refresh. As mentioned in other post and comments, you want to separate out the HTML from data processing.
class data {
public $location = array();
public $restaurant = array();
}
// start your session
session_start();
$data = new data;
$query = mysqli_query($mysqli, "SELECT * FROM restaurants");
//actually a big long query, simplified for illustrative purposes here
$i = 0;
while ($row = mysqli_fetch_array($query)) {
$data->location[] = $i.')'.$row['location'];
$data->restaurant[] = $row['restaurant'];
$i++;
}
// HTML (separate from data processing)
foreach ($data->location as $location) {
echo $location . '<br />';
}
// save your session
$_SESSION['data'] = $data;
When you wish to reference the data object from another page
// start your session
session_start();
// get data object
$data = $_SESSION['data'];
// do something with data
foreach($data->location as $location) {
echo $location . '<br />';
}
SELECT data in database is rather inexpensive, in general speaking. You didn't need to worry about running the query twice. MySQL will do the caching part.
From your codes, you mixed up database data with HTML. I suggest to separate it.
// fetch data part
while ($row = mysqli_fetch_array($query)) {
$data->location[] = $row['location'];
$data->restaurant[] = $row['restaurant'];
}
// print HTML part
$i = 0;
foreach($data->location as $loc) {
echo $i . ')' . $loc . '<br />';
}
First you say this:
I'm a beginner to OOP.
Then you say this:
I prefer not to run the query twice. I understand that classes are
created and destroyed in a single PHP page load.
You are overthinking this. PHP is a scripting language based on a user request to that script. Meaning, it will always reload—and rerun—the code on each load of the PHP page. So there is no way around that.
And when I say you are overthinking this, PHP is basically a part of a L.A.M.P. stack (Linux, Apache, MySQL & PHP) so the burden of query speed rests on the MySQL server which will cache the request anyway.
Meaning while you are thinking of PHP efficiency, the inherent architecture of PHP insists that queries be run on each load. And with that in mind the burden of managing the queries falls on MySQL and on the efficiency of the server & the design of the data structures in the database.
So if you are worried about you code eating up resources, think about improving MySQL efficiency in some way. But each layer of a L.A.M.P. stack has its purpose. And the PHP layer’s purpose is to just reload & rerun scripts in each request.
You are probably are looking for the Repository pattern.
The general idea is to have a class that can retrieve data objects for you.
Example:
$db = new Db(); // your db instance; you can use PDO for this.
$repo = new RestaurantRepository($db); // create a repo instance
$restaurants = $repo->getRestaurants(); // retrieve and array of restaurants instances
Implentation:
class RestaurantRepository {
public function __construct($db) {
$this->db = $db;
}
public function getRestaurants() {
// do query and return an array of instances
}
}
Code is untested and may have typos but it's a starter.
Saving the query results to a $_SESSION variable in the form of an array results in not having to re-run the query on another page. Additionally, it manages state correctly as I can unset($_SESSION['name']) if the query is re-run with different parameters.
I can also save the output of class data to a session variable. It seems to me that this design pattern makes more sense than running a new query for page refreshes.
I've written my first functional PHP webapp called Heater. It presents interactive calendar heatmaps using the Google Charts libraries and a AWS Redshift backend.
Now that I have it working, I've started improving the performance. I've installed APC and verified it is working.
My question is how do I enable query caching in front of Redshift?
Here's an example of how I'm loading data for now:
getRsData.php:
<?php
$id=$_GET["id"];
$action=$_GET["action"];
$connect = $rec = "";
$connect = pg_connect('host=myredshift.redshift.amazonaws.com port=5439 dbname=mydbname user=dmourati password=mypasword');
if ($action == "upload")
$rec = pg_query($connect,"SELECT date,SUM(upload_count) as upload_count from dwh.mytable where enterprise_id='$id' GROUP BY date");
...
?>
Some of the queries take > 5 seconds which negatively impacts the user experience. The data is slow moving as in it updates only once per day. I'd like to front the Redshift query with a local APC cache and then invalidate it via cron (or some such) once a day to allow for the newer data to flow in. I'd eventually like to create a cache warming script but that is not necessary at this time.
Any pointers or tips to documentation are helpful. I've spent some time googling but most docs out there are just about document caching not query caching if that makes sense. This is a standalone host running AWS Linux and PHP 5.3 with apc-3.1.15.
Thanks.
EDIT to add input validation
if (!preg_match("/^[0-9]*$/",$id)) {
$idErr = "Only numbers allowed";
}
if (empty($_GET["action"])) {
$actionErr = "Action is required";
} else {
$action = test_input($action);
}
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
It doesn't seem APC is needed for this since you're caching data for a day which is relatively long.
The code below caches your query results in a file ($cache_path). Before querying redshift it checks whether a cache file for the given enterprise id exists and was created the same day. If it does and if the code can successfully retrieve the cache then the rows are returned from the cache but if the file doesn't exist or the rows can't be retrieved from the cache, the code will query the db and write the cache.
The results of the query/cache are returned in $rows
<?php
$id=$_GET["id"];
$action=$_GET["action"];
$connect = $rec = "";
$connect = pg_connect('host=myredshift.redshift.amazonaws.com port=5439 dbname=mydbname user=dmourati password=mypasword');
if ($action == "upload") {
$cache_path = "/my_cache_path/upload_count/$id";
if(!file_exists($cache_path)
|| date('Y-m-d',filemtime($cache_path)) < date('Y-m-d')
|| false === $rows = unserialize(file_get_contents($cache_path))) {
$rows = array();
$rec = pg_query($connect,"SELECT date,SUM(upload_count) as upload_count from dwh.mytable where enterprise_id='$id' GROUP BY date");
while($r = pg_fetch_assoc($rec)) {
$rows[] = $r;
}
file_put_contents($cache_path,serialize($rows));
}
}
?>
If you dont want file caching just use a caching class like FastCache
http://www.phpfastcache.com/
It can automatically find apc or any other caching solution.
usage is really easy
<?php
// In your config file
include("phpfastcache/phpfastcache.php");
phpFastCache::setup("storage","auto");
// phpFastCache support "apc", "memcache", "memcached", "wincache" ,"files", "sqlite" and "xcache"
// You don't need to change your code when you change your caching system. Or simple keep it auto
$cache = phpFastCache();
// In your Class, Functions, PHP Pages
// try to get from Cache first. product_page = YOUR Identity Keyword
$products = $cache->get("product_page");
if($products == null) {
$products = YOUR DB QUERIES || GET_PRODUCTS_FUNCTION;
// set products in to cache in 600 seconds = 10 minutes
$cache->set("product_page", $products,600);
}
// Output Your Contents $products HERE
?>
this example is also from http://www.phpfastcache.com/
Hope it helps, and have fun with your really cool Project!
I am recording unique page views using memcached and storing them in db at 15 mins interval. Whenever number of users grow memcached gives me following error:
Memcache::get(): Server localhost (tcp 10106) failed with: Failed reading line from stream (0)
I am using following code for insert/update page views in memcached
if($memcached->is_valid_cache("visiors")) {
$log_views = $memcached->get_cache("visiors");
if(!is_array($log_views)) $log_views = array();
}
else {
$log_views = array();
}
$log_views[] = array($page_id, $time, $other_Stuff);
$memcached->set_cache("visiors", $log_views, $cache_expire_time);
Following code retrieves the array from memcached, updates the X number of page views in db and sets the remaining page views in memcached
if($memcached->is_valid_cache("visiors")) {
$log_views = $memcached->get_cache("visiors");
if(is_array($log_views) && count($log_views) > 0) {
$logs = array_slice($log_views, 0, $insert_limit);
$insert_array = array();
foreach($logs as $log) {
$insert_array[] = '('. $log[0]. ',' . $log[1] . ', NOW())';
}
$insert_sql = implode(',',$insert_array);
if(mysql_query('INSERT SQL CODE')) {
$memcached->set_cache("visiors", array_slice($log_views, $insert_limit), $cache_expire_time); //store new values
}
}
}
The insert/update cause thread locking because I can see lots of script in waiting for their turn. I think I am losing page views during the update process. Any suggestions how to avoid memcached reading errors and make this code perfect?
You are likely running into a connection limit within memcached, your firewall, network, etc. We have a simple walk through on the most common scenarios: http://code.google.com/p/memcached/wiki/Timeouts
There's no internal locking that would cause sets or gets to block for any amount of time.