Memcached high insert/update cause the deadlocks - php

I am recording unique page views using memcached and storing them in db at 15 mins interval. Whenever number of users grow memcached gives me following error:
Memcache::get(): Server localhost (tcp 10106) failed with: Failed reading line from stream (0)
I am using following code for insert/update page views in memcached
if($memcached->is_valid_cache("visiors")) {
$log_views = $memcached->get_cache("visiors");
if(!is_array($log_views)) $log_views = array();
}
else {
$log_views = array();
}
$log_views[] = array($page_id, $time, $other_Stuff);
$memcached->set_cache("visiors", $log_views, $cache_expire_time);
Following code retrieves the array from memcached, updates the X number of page views in db and sets the remaining page views in memcached
if($memcached->is_valid_cache("visiors")) {
$log_views = $memcached->get_cache("visiors");
if(is_array($log_views) && count($log_views) > 0) {
$logs = array_slice($log_views, 0, $insert_limit);
$insert_array = array();
foreach($logs as $log) {
$insert_array[] = '('. $log[0]. ',' . $log[1] . ', NOW())';
}
$insert_sql = implode(',',$insert_array);
if(mysql_query('INSERT SQL CODE')) {
$memcached->set_cache("visiors", array_slice($log_views, $insert_limit), $cache_expire_time); //store new values
}
}
}
The insert/update cause thread locking because I can see lots of script in waiting for their turn. I think I am losing page views during the update process. Any suggestions how to avoid memcached reading errors and make this code perfect?

You are likely running into a connection limit within memcached, your firewall, network, etc. We have a simple walk through on the most common scenarios: http://code.google.com/p/memcached/wiki/Timeouts
There's no internal locking that would cause sets or gets to block for any amount of time.

Related

Maximum time execution CodeIgniter 3 issue

I got that the only solution to avoid the Maximum time execution CodeIgniter 3 issue is to increase the time execution from 30 to 300 for example.
I'm using CodeIgniter in a news website. I'm loading only 20 latest news in the news section page and I think that it's not a big number to make the server out of execution time. (Notice that the news table has more than 1400 news and the seen table has more than 150.000 logs).
I say that it's not logical that the user should wait for more than 50 seconds to get the respond and load the page.## Heading ##
Is there any useful way to load the page as fast as possible without "maximum time execution"?
My Code in the model:
public function get_section_news($id_section = 0, $length = 0, $id_sub_section = 0, $id_news_lessthan = 0) {
$arr = [] or array();
//
if (intval($id_section) > 0 and intval($length) > 0) {
//
$where = [] or array();
$where['sections.activity'] = 1;
$where['news.deleted'] = 0;
$where['news.id_section'] = $id_section;
$query = $this->db;
$query
->from("news")
->join("sections", "news.id_section = sections.id_section", "inner")
->order_by("news.id_news", "desc")
->limit($length);
//
if (intval($id_sub_section) > 0) {
$where['news.id_section_sub'] = $id_sub_section;
}
if ($id_news_lessthan > 0) {
$where['news.id_news <'] = $id_news_lessthan;
}
//
$get = $query->where($where)->get();
$num = $get->num_rows();
if ($num > 0) {
//
foreach ($get->result() as $key => $value) {
$arr['row'][] = $value;
}
}
$arr['is_there_more'] = ($length > $num and $num > 0) ? true : false;
}
return $arr;
}
This usually has nothing to do with the framework. You may run the following command on your mysql client and check if there are any sleeping queries on your database.
SHOW FULL PROCESSLIST
most likely you have sleeping queries since you are not emptying result set with
$get->free_result();
Another problem may be slow queries on this I recommend the following
1) make sure you are using the same database engine on all tables for this I recommend INNODB as some engines lock the whole table during a transaction which is undesirable You should have noticed this already when you ran show full processlist
2) Run your queries on a mysql client and observe how long they will take to execute. If they take too long it may be a result of unindexed tables. You may Explain your query to identify unindexed tables. You may follow these 1,2,3 tutorials on indexing your tables. Or you can do it easily with tools like navicat

PHP array inserting / manipulation degrading over iterations

I am in the process of transferring data from one database to another. They are different dbs (mssql to mysql) so I cant do direct queries and am using PHP as an intermediary. Consider the following code. For some reason, each time it goes through the while loop it takes twice as much time as the time before.
$continue = true;
$limit = 20000;
while($continue){
$i = 0;
$imp->endTimer();
$imp->startTimer("Fetching Apps");
$qry = "THIS IS A BASIC SELECT QUERY";
$data = $imp->src->dbQuery($qry, array(), PDO::FETCH_ASSOC);
$inserts = array();
$continue = (count($data) == $limit);
$imp->endTimer();
$imp->startTimer("Processing Apps " . memory_get_usage() );
if($data == false){
$continue = false;
}
else{
foreach($data AS $row){
// THERE IS SOME EXTREMELY BASIC IF STATEMENTS HERE
$inserts[] = array(
"paymentID"=>$paymentID,
"ticketID"=>$ticketID,
"applicationLink"=>$row{'ApplicationID'},
"paymentLink"=>(int)($paymentLink),
"ticketLink"=>(int)($ticketLink),
"dateApplied"=>$row{'AddDate'},
"appliedBy"=>$adderID,
"appliedAmount"=>$amount,
"officeID"=>$imp->officeID,
"customerID"=>-1,
"taxCollected"=>0
);
$i++;
$minID = $row{'ApplicationID'};
}
}
$imp->endTimer();
$imp->startTimer("Inserting $i Apps");
if(count($inserts) > 0){
$imp->dest->dbBulkInsert("appliedPayments", $inserts);
}
unset($data);
unset($inserts);
echo "Inserted $i Apps<BR>";
}
It doesn't matter what I set the limit to, the processing portion takes twice as long each time. I am logging each portion of the loop and selecting the data from the old database and inserting it into the new one take no time at all. The "processing portion" is doubling every time. Why? Here are the logs, if you do some quick math on the timestamps, each step labeled "Processing Apps" takes twice as long as the one before... (I stopped it a little early on this one, but it was taking a significantly longer time on the final iteration)
Well - so I don't know why this works, but if I move everything inside the while loop into a separate function, it DRAMATICALLY increases performance. Im guessing its a garbage collection / memory management issue and that having a function call end helps the Garbage collector know it can release the memory. Now when I log the memory usage, the memory usage stays constant between calls instead of growing... Dirty php...

Enabling PHP APC Query Caching

I've written my first functional PHP webapp called Heater. It presents interactive calendar heatmaps using the Google Charts libraries and a AWS Redshift backend.
Now that I have it working, I've started improving the performance. I've installed APC and verified it is working.
My question is how do I enable query caching in front of Redshift?
Here's an example of how I'm loading data for now:
getRsData.php:
<?php
$id=$_GET["id"];
$action=$_GET["action"];
$connect = $rec = "";
$connect = pg_connect('host=myredshift.redshift.amazonaws.com port=5439 dbname=mydbname user=dmourati password=mypasword');
if ($action == "upload")
$rec = pg_query($connect,"SELECT date,SUM(upload_count) as upload_count from dwh.mytable where enterprise_id='$id' GROUP BY date");
...
?>
Some of the queries take > 5 seconds which negatively impacts the user experience. The data is slow moving as in it updates only once per day. I'd like to front the Redshift query with a local APC cache and then invalidate it via cron (or some such) once a day to allow for the newer data to flow in. I'd eventually like to create a cache warming script but that is not necessary at this time.
Any pointers or tips to documentation are helpful. I've spent some time googling but most docs out there are just about document caching not query caching if that makes sense. This is a standalone host running AWS Linux and PHP 5.3 with apc-3.1.15.
Thanks.
EDIT to add input validation
if (!preg_match("/^[0-9]*$/",$id)) {
$idErr = "Only numbers allowed";
}
if (empty($_GET["action"])) {
$actionErr = "Action is required";
} else {
$action = test_input($action);
}
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
It doesn't seem APC is needed for this since you're caching data for a day which is relatively long.
The code below caches your query results in a file ($cache_path). Before querying redshift it checks whether a cache file for the given enterprise id exists and was created the same day. If it does and if the code can successfully retrieve the cache then the rows are returned from the cache but if the file doesn't exist or the rows can't be retrieved from the cache, the code will query the db and write the cache.
The results of the query/cache are returned in $rows
<?php
$id=$_GET["id"];
$action=$_GET["action"];
$connect = $rec = "";
$connect = pg_connect('host=myredshift.redshift.amazonaws.com port=5439 dbname=mydbname user=dmourati password=mypasword');
if ($action == "upload") {
$cache_path = "/my_cache_path/upload_count/$id";
if(!file_exists($cache_path)
|| date('Y-m-d',filemtime($cache_path)) < date('Y-m-d')
|| false === $rows = unserialize(file_get_contents($cache_path))) {
$rows = array();
$rec = pg_query($connect,"SELECT date,SUM(upload_count) as upload_count from dwh.mytable where enterprise_id='$id' GROUP BY date");
while($r = pg_fetch_assoc($rec)) {
$rows[] = $r;
}
file_put_contents($cache_path,serialize($rows));
}
}
?>
If you dont want file caching just use a caching class like FastCache
http://www.phpfastcache.com/
It can automatically find apc or any other caching solution.
usage is really easy
<?php
// In your config file
include("phpfastcache/phpfastcache.php");
phpFastCache::setup("storage","auto");
// phpFastCache support "apc", "memcache", "memcached", "wincache" ,"files", "sqlite" and "xcache"
// You don't need to change your code when you change your caching system. Or simple keep it auto
$cache = phpFastCache();
// In your Class, Functions, PHP Pages
// try to get from Cache first. product_page = YOUR Identity Keyword
$products = $cache->get("product_page");
if($products == null) {
$products = YOUR DB QUERIES || GET_PRODUCTS_FUNCTION;
// set products in to cache in 600 seconds = 10 minutes
$cache->set("product_page", $products,600);
}
// Output Your Contents $products HERE
?>
this example is also from http://www.phpfastcache.com/
Hope it helps, and have fun with your really cool Project!

Avoid to many SQL query on page reload

I have a website page that, on load, fires 10 different queries against a table with 150,000,000 rows.
Normally the page loads in under 2 seconds - but if I refresh too often, it creates a lot of queries, which slow page load time by up to 10 seconds.
How can I avoid firing all of those queries, since it would kill my database?
I have no caching yet. The site works in the following way. I have a table were all URIs are stored. If a user enters a URL I grap the URI out of the called URL and check in the table if the URI is stored. In case the URI is stored in the table I pull the corresponding data from the other tables in a relational database.
An example code from one of the PHP files that pulls the information from the other tables is this
<?php
set_time_limit(2);
define('MODX_CORE_PATH', '/path/to/modx/core/');
define('MODX_CONFIG_KEY','config');
require_once MODX_CORE_PATH . 'model/modx/modx.class.php';
// Criteria for foreign Database
$host = 'hostname';
$username = 'user';
$password = 'password';
$dbname = 'database';
$port = 3306;
$charset = 'utf8mb4';
$dsn = "mysql:host=$host;dbname=$dbname;port=$port;charset=$charset";
$xpdo = new xPDO($dsn, $username, $password);
// Catch the URI that is called
$pageURI = $_SERVER["REQUEST_URI"];
// Get the language token saved as TV "area" in parent and remove it
if (!isset($modx)) return '';
$top = isset($top) && intval($top) ? $top : 0;
$id= isset($id) && intval($id) ? intval($id) : $modx->resource->get('id');
$topLevel= isset($topLevel) && intval($topLevel) ? intval($topLevel) : 0;
if ($id && $id != $top) {
$pid = $id;
$pids = $modx->getParentIds($id);
if (!$topLevel || count($pids) >= $topLevel) {
while ($parentIds= $modx->getParentIds($id, 1)) {
$pid = array_pop($parentIds);
if ($pid == $top) {
break;
}
$id = $pid;
$parentIds = $modx->getParentIds($id);
if ($topLevel && count($parentIds) < $topLevel) {
break;
}
}
}
}
$parentid = $modx->getObject('modResource', $id);
$area = "/".$parentid->getTVValue('area');
$URL = str_replace($area, '', $pageURI);
$lang= $parentid->getTVValue('lang');
// Issue queries against the foreign database:
$output = '';
$sql = "SELECT epf_application_detail.description FROM epf_application_detail INNER JOIN app_uri ON epf_application_detail.application_id=app_uri.application_id WHERE app_uri.uri = '$URL' AND epf_application_detail.language_code = '$lang'";
foreach ($xpdo->query($sql) as $row) {
$output .= nl2br($row['description']);
}
return $output;
Without knowing what language you're using, here is some pseudo-code to get you started.
Instead of firing a large query every time your page loads, you could create a separate table called something like "cache". You could run your query and then store the data from the query in that cache table. Then when your page loads, you can query the cache table, which will be much smaller, and won't bog things down when you refresh the page a lot.
Pseudo-Code (which can be done on an interval using a cronjob or something, to keep your cache fresh.):
Run your ten large queries
For each query, add the results to cache like so:
query_id | query_data
----------------------------------------------------
1 | {whatever your query data looks like}
Then, when your page loads, have each query collect the data from cache
It is important to note, that with a cache table, you will need to refresh it often. (either as often as you get more data, or on a set interval, like every 5 minutes or something.)
You should do some serverside caching and optimizations.
If I was you I would install Memcached for your database.
I would also consider some static caching like Varnish, this will cache every page as static HTML, With varnish the 2nd (and 3th, 4th,...) request doesn't have to be handled by PHP and MySQL, which will make it load a lot faster when you load it for the 2nd time (within the cache lifetime ofcourse).
Last of all you can help the PHP side handle the data better by installing APC (or an other opcode cache).

This code needs to loop over 3.5 million rows, how can I make it more efficient?

I have a csv file that has 3.5 million codes in it.
I should point out that this is only EVER going to be this once.
The csv looks like
age9tlg,
rigfh34,
...
Here is my code:
ini_set('max_execution_time', 600);
ini_set("memory_limit", "512M");
$file_handle = fopen("Weekly.csv", "r");
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
mysql_query("insert into `action_6_weekly` Values('$col', '')") or die(mysql_error());
}
} else {
if (!empty($line_of_text)) {
mysql_query("insert into `action_6_weekly` Values('$line_of_text', '')") or die(mysql_error());
}
}
}
fclose($file_handle);
Is this code going to die part way through on me?
Will my memory and max execution time be high enough?
NB:
This code will be run on my localhost, and the database is on the same PC, so latency is not an issue.
Update:
here is another possible implementation.
This one does it in bulk inserts of 2000 records
$file_handle = fopen("Weekly.csv", "r");
$i = 0;
$vals = array();
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
if ($i < 2000) {
$vals[] = "('$col', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
} else {
if (!empty($line_of_text)) {
if ($i < 2000) {
$vals[] = "('$line_of_text', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
}
}
fclose($file_handle);
if i was to use this method what is the highest value i could set it to insert at once?
Update 2
so, ive found i can use
LOAD DATA LOCAL INFILE 'C:\\xampp\\htdocs\\weekly.csv' INTO TABLE `action_6_weekly` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY ','(`code`)
but the issue now is that, i was wrong about the csv format,
it is actually 4 codes and then a line break,
so
fhroflg,qporlfg,vcalpfx,rplfigc,
vapworf,flofigx,apqoeei,clxosrc,
...
so i need to be able to specify two LINES TERMINATED BY
this question has been branched out to Here.
Update 3
Setting it to do bulk inserts of 20k rows, using
while (!feof($file_handle)) {
$val[] = fgetcsv($file_handle);
$i++;
if($i == 20000) {
//do insert
//set $i = 0;
//$val = array();
}
}
//do insert(for last few rows that dont reach 20k
but it dies at this point because for some reason $val contains 75k rows, and idea why?
note the above code is simplified.
I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.
is this code going to die part way
through on me? will my memory and max
execution time be high enough?
Why don't you try and find out?
You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.
Note that MySQL supports delayed and multiple row insertion:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
http://dev.mysql.com/doc/refman/5.1/en/insert.html
make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.
I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.
You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.
Also, you should run this on the command line, where you won't need to worry about execution time limits.
The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.
I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.
2 possible ways.
1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.
2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.

Categories