I've written my first functional PHP webapp called Heater. It presents interactive calendar heatmaps using the Google Charts libraries and a AWS Redshift backend.
Now that I have it working, I've started improving the performance. I've installed APC and verified it is working.
My question is how do I enable query caching in front of Redshift?
Here's an example of how I'm loading data for now:
getRsData.php:
<?php
$id=$_GET["id"];
$action=$_GET["action"];
$connect = $rec = "";
$connect = pg_connect('host=myredshift.redshift.amazonaws.com port=5439 dbname=mydbname user=dmourati password=mypasword');
if ($action == "upload")
$rec = pg_query($connect,"SELECT date,SUM(upload_count) as upload_count from dwh.mytable where enterprise_id='$id' GROUP BY date");
...
?>
Some of the queries take > 5 seconds which negatively impacts the user experience. The data is slow moving as in it updates only once per day. I'd like to front the Redshift query with a local APC cache and then invalidate it via cron (or some such) once a day to allow for the newer data to flow in. I'd eventually like to create a cache warming script but that is not necessary at this time.
Any pointers or tips to documentation are helpful. I've spent some time googling but most docs out there are just about document caching not query caching if that makes sense. This is a standalone host running AWS Linux and PHP 5.3 with apc-3.1.15.
Thanks.
EDIT to add input validation
if (!preg_match("/^[0-9]*$/",$id)) {
$idErr = "Only numbers allowed";
}
if (empty($_GET["action"])) {
$actionErr = "Action is required";
} else {
$action = test_input($action);
}
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
It doesn't seem APC is needed for this since you're caching data for a day which is relatively long.
The code below caches your query results in a file ($cache_path). Before querying redshift it checks whether a cache file for the given enterprise id exists and was created the same day. If it does and if the code can successfully retrieve the cache then the rows are returned from the cache but if the file doesn't exist or the rows can't be retrieved from the cache, the code will query the db and write the cache.
The results of the query/cache are returned in $rows
<?php
$id=$_GET["id"];
$action=$_GET["action"];
$connect = $rec = "";
$connect = pg_connect('host=myredshift.redshift.amazonaws.com port=5439 dbname=mydbname user=dmourati password=mypasword');
if ($action == "upload") {
$cache_path = "/my_cache_path/upload_count/$id";
if(!file_exists($cache_path)
|| date('Y-m-d',filemtime($cache_path)) < date('Y-m-d')
|| false === $rows = unserialize(file_get_contents($cache_path))) {
$rows = array();
$rec = pg_query($connect,"SELECT date,SUM(upload_count) as upload_count from dwh.mytable where enterprise_id='$id' GROUP BY date");
while($r = pg_fetch_assoc($rec)) {
$rows[] = $r;
}
file_put_contents($cache_path,serialize($rows));
}
}
?>
If you dont want file caching just use a caching class like FastCache
http://www.phpfastcache.com/
It can automatically find apc or any other caching solution.
usage is really easy
<?php
// In your config file
include("phpfastcache/phpfastcache.php");
phpFastCache::setup("storage","auto");
// phpFastCache support "apc", "memcache", "memcached", "wincache" ,"files", "sqlite" and "xcache"
// You don't need to change your code when you change your caching system. Or simple keep it auto
$cache = phpFastCache();
// In your Class, Functions, PHP Pages
// try to get from Cache first. product_page = YOUR Identity Keyword
$products = $cache->get("product_page");
if($products == null) {
$products = YOUR DB QUERIES || GET_PRODUCTS_FUNCTION;
// set products in to cache in 600 seconds = 10 minutes
$cache->set("product_page", $products,600);
}
// Output Your Contents $products HERE
?>
this example is also from http://www.phpfastcache.com/
Hope it helps, and have fun with your really cool Project!
Related
I explain, I have a Symfony2 project and I need to import users via csv file in my database. I have to do some work on the datas before importing it in MySQL. I created a service for this and everything is working fine but it takes too much time to execute and slow my server if I give it my entire file. My files have usually between 500 and 1500 rows and I have to split my file in ~200 rows files and import one by one.
I need to handle related users that can be both in the file and/or in database already. Related users are usually a parent of a child.
Here is my simplified code :
$validator = $this->validator;
$members = array();
$children = array();
$mails = array();
$handle = fopen($filePath, 'r');
$datas = fgetcsv($handle, 0, ";");
while (($datas = fgetcsv($handle, 0, ";")) !== false) {
$user = new User();
//If there is a related user
if($datas[18] != ""){
$user->setRelatedMemberEmail($datas[18]);
$relation = array_search(ucfirst(strtolower($datas[19])), UserComiti::$RELATIONSHIPS);
if($relation !== false)
$user->setParentRelationship($relation);
}
else {
$user->setRelatedMemberEmail($datas[0]);
$user->addRole ( "ROLE_MEMBER" );
}
$user->setEmail($mail);
$user->setLastName($lastName);
$user->setFirstName($firstName);
$user->setGender($gender);
$user->setBirthdate($birthdate);
$user->setCity($city);
$user->setPostalCode($zipCode);
$user->setAddressLine1($adressLine1);
$user->setAddressLine2($adressLine2);
$user->setCountry($country);
$user->setNationality($nationality);
$user->setPhoneNumber($phone);
//Entity Validation
$listErrors = $validator->validate($user);
//In case of errors
if(count($listErrors) > 0) {
foreach($listErrors as $error){
$nbError++;
$errors .= "Line " . $line . " : " . $error->getMessage() . "\n";
}
}
else {
if($mailParent != null)
$children[] = $user;
else{
$members[] = $user;
$nbAdded++;
}
}
foreach($members as $user){
$this->em->persist($user);
$this->em->flush();
}
foreach($children as $child){
//If the related user is already in DB
$parent = $this->userRepo->findOneBy(array('username' => $child->getRelatedMemberEmail(), 'club' => $this->club));
if ($parent !== false){
//Check if someone related to related user already has the same last name and first name. If it is the case we can guess that this user is already created
$testName = $this->userRepo->findByParentAndName($child->getFirstName(), $child->getLastName(), $parent, $this->club);
if(!$testName){
$child->setParent($parent);
$this->em->persist($child);
$nbAdded++;
}
else
$nbSkipped++;
}
//Else in case the related user is neither file nor in database we create a fake one that will be able to update his profile later.
else{
$newParent = clone $child;
$newParent->setUsername($child->getRelatedMemberEmail());
$newParent->setEmail($child->getRelatedMemberEmail());
$newParent->setFirstName('Unknown');
$this->em->persist($newParent);
$child->setParent($newParent);
$this->em->persist($child);
$nbAdded += 2;
$this->em->flush();
}
}
}
It's not my whole service because I don't think the remaining would help here but if you need more information ask me.
While I do not heave the means to quantitatively determine the bottlenecks in your program, I can suggest a couple of guidelines that will likely significantly increase its performance.
Minimize the number of database commits you are making. A lot happens when you write to the database. Is it possible to commit only once at the end?
Minimize the number of database reads you are making. Similar to the previous point, a lot happens when you read from the database.
If after considering the above points you still have issues, determine what SQL the ORM is actually generating and executing. ORMs work great until efficiency becomes a problem and more care needs to go into ensuring optimal queries are being generated. At this point, becoming more familiar with the ORM and SQL would be beneficial.
You don't seem to be working with too much data, but if you were, MySQL alone supports reading CSV files.
The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed.
https://dev.mysql.com/doc/refman/5.7/en/load-data.html
You may be able to access this MySQL specific feature through your ORM, but if not, you would need to write some plain SQL to utilize it. Since you need to modify the data you are reading from the CSV, you would likely be able to do this very, very quickly by following these steps:
Use LOAD DATA INFILE to read the CSV into a temporary table.
Manipulate the data in the temporary table and other tables as required.
SELECT the data from the temporary table into your destination table.
I know that it is very old topic, but some time ago I created a bundle, which can help import entities from csv to database. So maybe if someone will see this topic, it will be helpful for him.
https://github.com/jgrygierek/BatchEntityImportBundle
https://github.com/jgrygierek/SonataBatchEntityImportBundle
I'm a beginner to OOP. The following is the jist of my code, which I am trying to find a proper design pattern for:
class data {
public $location = array();
public $restaurant = array();
}
$data = new data;
$query = mysqli_query($mysqli, "SELECT * FROM restaurants");
//actually a big long query, simplified for illustrative purposes here
$i = 0;
while ($row = mysqli_fetch_array($query)) {
$data->location[] = $i.')'.$row['location']."<br>";
$data->restaurant[] = $row['restaurant']."<br>";
$i++;
}
I'd like to access the data class from another PHP page. (To print out information in HTML, hence the tags). I prefer not to run the query twice. I understand that classes are created and destroyed in a single PHP page load. I'd appreciated design pattern guidance for managing HTTP application state and minimizing computer resources in such a situation.
If you store the data object in a $_SESSION variable, you will have access to it from other pages and upon refresh. As mentioned in other post and comments, you want to separate out the HTML from data processing.
class data {
public $location = array();
public $restaurant = array();
}
// start your session
session_start();
$data = new data;
$query = mysqli_query($mysqli, "SELECT * FROM restaurants");
//actually a big long query, simplified for illustrative purposes here
$i = 0;
while ($row = mysqli_fetch_array($query)) {
$data->location[] = $i.')'.$row['location'];
$data->restaurant[] = $row['restaurant'];
$i++;
}
// HTML (separate from data processing)
foreach ($data->location as $location) {
echo $location . '<br />';
}
// save your session
$_SESSION['data'] = $data;
When you wish to reference the data object from another page
// start your session
session_start();
// get data object
$data = $_SESSION['data'];
// do something with data
foreach($data->location as $location) {
echo $location . '<br />';
}
SELECT data in database is rather inexpensive, in general speaking. You didn't need to worry about running the query twice. MySQL will do the caching part.
From your codes, you mixed up database data with HTML. I suggest to separate it.
// fetch data part
while ($row = mysqli_fetch_array($query)) {
$data->location[] = $row['location'];
$data->restaurant[] = $row['restaurant'];
}
// print HTML part
$i = 0;
foreach($data->location as $loc) {
echo $i . ')' . $loc . '<br />';
}
First you say this:
I'm a beginner to OOP.
Then you say this:
I prefer not to run the query twice. I understand that classes are
created and destroyed in a single PHP page load.
You are overthinking this. PHP is a scripting language based on a user request to that script. Meaning, it will always reload—and rerun—the code on each load of the PHP page. So there is no way around that.
And when I say you are overthinking this, PHP is basically a part of a L.A.M.P. stack (Linux, Apache, MySQL & PHP) so the burden of query speed rests on the MySQL server which will cache the request anyway.
Meaning while you are thinking of PHP efficiency, the inherent architecture of PHP insists that queries be run on each load. And with that in mind the burden of managing the queries falls on MySQL and on the efficiency of the server & the design of the data structures in the database.
So if you are worried about you code eating up resources, think about improving MySQL efficiency in some way. But each layer of a L.A.M.P. stack has its purpose. And the PHP layer’s purpose is to just reload & rerun scripts in each request.
You are probably are looking for the Repository pattern.
The general idea is to have a class that can retrieve data objects for you.
Example:
$db = new Db(); // your db instance; you can use PDO for this.
$repo = new RestaurantRepository($db); // create a repo instance
$restaurants = $repo->getRestaurants(); // retrieve and array of restaurants instances
Implentation:
class RestaurantRepository {
public function __construct($db) {
$this->db = $db;
}
public function getRestaurants() {
// do query and return an array of instances
}
}
Code is untested and may have typos but it's a starter.
Saving the query results to a $_SESSION variable in the form of an array results in not having to re-run the query on another page. Additionally, it manages state correctly as I can unset($_SESSION['name']) if the query is re-run with different parameters.
I can also save the output of class data to a session variable. It seems to me that this design pattern makes more sense than running a new query for page refreshes.
I have a website page that, on load, fires 10 different queries against a table with 150,000,000 rows.
Normally the page loads in under 2 seconds - but if I refresh too often, it creates a lot of queries, which slow page load time by up to 10 seconds.
How can I avoid firing all of those queries, since it would kill my database?
I have no caching yet. The site works in the following way. I have a table were all URIs are stored. If a user enters a URL I grap the URI out of the called URL and check in the table if the URI is stored. In case the URI is stored in the table I pull the corresponding data from the other tables in a relational database.
An example code from one of the PHP files that pulls the information from the other tables is this
<?php
set_time_limit(2);
define('MODX_CORE_PATH', '/path/to/modx/core/');
define('MODX_CONFIG_KEY','config');
require_once MODX_CORE_PATH . 'model/modx/modx.class.php';
// Criteria for foreign Database
$host = 'hostname';
$username = 'user';
$password = 'password';
$dbname = 'database';
$port = 3306;
$charset = 'utf8mb4';
$dsn = "mysql:host=$host;dbname=$dbname;port=$port;charset=$charset";
$xpdo = new xPDO($dsn, $username, $password);
// Catch the URI that is called
$pageURI = $_SERVER["REQUEST_URI"];
// Get the language token saved as TV "area" in parent and remove it
if (!isset($modx)) return '';
$top = isset($top) && intval($top) ? $top : 0;
$id= isset($id) && intval($id) ? intval($id) : $modx->resource->get('id');
$topLevel= isset($topLevel) && intval($topLevel) ? intval($topLevel) : 0;
if ($id && $id != $top) {
$pid = $id;
$pids = $modx->getParentIds($id);
if (!$topLevel || count($pids) >= $topLevel) {
while ($parentIds= $modx->getParentIds($id, 1)) {
$pid = array_pop($parentIds);
if ($pid == $top) {
break;
}
$id = $pid;
$parentIds = $modx->getParentIds($id);
if ($topLevel && count($parentIds) < $topLevel) {
break;
}
}
}
}
$parentid = $modx->getObject('modResource', $id);
$area = "/".$parentid->getTVValue('area');
$URL = str_replace($area, '', $pageURI);
$lang= $parentid->getTVValue('lang');
// Issue queries against the foreign database:
$output = '';
$sql = "SELECT epf_application_detail.description FROM epf_application_detail INNER JOIN app_uri ON epf_application_detail.application_id=app_uri.application_id WHERE app_uri.uri = '$URL' AND epf_application_detail.language_code = '$lang'";
foreach ($xpdo->query($sql) as $row) {
$output .= nl2br($row['description']);
}
return $output;
Without knowing what language you're using, here is some pseudo-code to get you started.
Instead of firing a large query every time your page loads, you could create a separate table called something like "cache". You could run your query and then store the data from the query in that cache table. Then when your page loads, you can query the cache table, which will be much smaller, and won't bog things down when you refresh the page a lot.
Pseudo-Code (which can be done on an interval using a cronjob or something, to keep your cache fresh.):
Run your ten large queries
For each query, add the results to cache like so:
query_id | query_data
----------------------------------------------------
1 | {whatever your query data looks like}
Then, when your page loads, have each query collect the data from cache
It is important to note, that with a cache table, you will need to refresh it often. (either as often as you get more data, or on a set interval, like every 5 minutes or something.)
You should do some serverside caching and optimizations.
If I was you I would install Memcached for your database.
I would also consider some static caching like Varnish, this will cache every page as static HTML, With varnish the 2nd (and 3th, 4th,...) request doesn't have to be handled by PHP and MySQL, which will make it load a lot faster when you load it for the 2nd time (within the cache lifetime ofcourse).
Last of all you can help the PHP side handle the data better by installing APC (or an other opcode cache).
I'm quite new to querying Microsoft's Active Directory and encountering some difficulties:
The AD has a size limit of 1000 elements per request. I cannot change the size limit. PHP does not seem to support paging (I'm using version 5.2 and there's no way of updating the production server.)
I've so far encountered two possible solutions:
Sort the entries by objectSid and use filters to get all the objects. Sample Code
I don't like that for several reasons:
It seems unpredictable to mess with the objectSid, as you have to take it apart, convert it to decimal, convert it back ...
I don't see how you can compare these id's.
(I've tried: '&((objectClass=user)(objectSid>=0))')
Filter after the first letters of the object names (as suggested here):
That's not an optimal solution as many of the users/groups in our system are prefixed with the same few letters.
So my question:
What approach is best used here?
If it's the first one, how can I be sure to handle the objectSid correctly?
Any other possibilities?
Am I missing something obvious?
Update:
- This related question provides information about why the Simple Paged Results extension does not work.
- The web server is running on a Linux server, so COM objects/adoDB are not an option.
I was able to get around the size limitation using ldap_control_paged_result
ldap_control_paged_result is used to Enable LDAP pagination by sending the pagination control. The below function worked perfectly in my case. This would work for (PHP 5 >= 5.4.0, PHP 7)
function retrieves_users($conn)
{
$dn = 'ou=,dc=,dc=';
$filter = "(&(objectClass=user)(objectCategory=person)(sn=*))";
$justthese = array();
// enable pagination with a page size of 100.
$pageSize = 100;
$cookie = '';
do {
ldap_control_paged_result($conn, $pageSize, true, $cookie);
$result = ldap_search($conn, $dn, $filter, $justthese);
$entries = ldap_get_entries($conn, $result);
if(!empty($entries)){
for ($i = 0; $i < $entries["count"]; $i++) {
$data['usersLdap'][] = array(
'name' => $entries[$i]["cn"][0],
'username' => $entries[$i]["userprincipalname"][0]
);
}
}
ldap_control_paged_result_response($conn, $result, $cookie);
} while($cookie !== null && $cookie != '');
return $data;
}
If you have successfully updated your server by now, then the function above can get all the entries. I am using this function to get all users in our AD.
As I've not found any clean solutions I decided to go with the first approach: Filtering By Object-Sids.
This workaround has it's limitations:
It only works for objects with an objectsid, i.e Users and Groups.
It assumes that all Users/Groups are created by the same authority.
It assumes that there are not more missing relative SIDs than the size limit.
The idea is it to first read all possible objects and pick out the one with the lowest relative SID. The relative SID is the last chunk in the SID:
S-1-5-21-3188256696-111411151-3922474875-1158
Let's assume this is the lowest relative SID in a search that only returned 'Partial Search Results'.
Let's further assume the size limit is 1000.
The program then does the following:
It searches all Objects with the SIDs between
S-1-5-21-3188256696-111411151-3922474875-1158
and
S-1-5-21-3188256696-111411151-3922474875-0159
then all between
S-1-5-21-3188256696-111411151-3922474875-1158
and
S-1-5-21-3188256696-111411151-3922474875-2157
and so on until one of the searches returns zero objects.
There are several problems with this approach, but it's sufficient for my purposes.
The Code:
$filter = '(objectClass=Group)';
$attributes = array('objectsid','cn'); //objectsid needs to be set
$result = array();
$maxPageSize = 1000;
$searchStep = $maxPageSize-1;
$adResult = #$adConn->search($filter,$attributes); //Supress warning for first query (because it exceeds the size limit)
//Read smallest RID from the resultset
$minGroupRID = '';
for($i=0;$i<$adResult['count'];$i++){
$groupRID = unpack('V',substr($adResult[$i]['objectsid'][0],24));
if($minGroupRID == '' || $minGroupRID>$groupRID[1]){
$minGroupRID = $groupRID[1];
}
}
$sidPrefix = substr($adResult[$i-1]['objectsid'][0],0,24); //Read last objectsid and cut off the prefix
$nextStepGroupRID = $minGroupRID;
do{ //Search for all objects with a lower objectsid than minGroupRID
$adResult = $adConn->search('(&'.$filter.'(objectsid<='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID))).')(objectsid>='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID-$searchStep))).'))', $attributes);
for($i=0;$i<$adResult['count'];$i++){
$RID = unpack('V',substr($adResult[$i]['objectsid'][0],24)); //Extract the relative SID from the SID
$RIDs[] = $RID[1];
$resultSet = array();
foreach($attributes as $attribute){
$resultSet[$attribute] = $adResult[$i][$attribute][0];
}
$result[$RID[1]] = $resultSet;
}
$nextStepGroupRID = $nextStepGroupRID-$searchStep;
}while($adResult['count']>1);
$nextStepGroupRID = $minGroupRID;
do{ //Search for all object with a higher objectsid than minGroupRID
$adResult = $adConn->search('(&'.$filter.'(objectsid>='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID))).')(objectsid<='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID+$searchStep))).'))', $attributes);
for($i=0;$i<$adResult['count'];$i++){
$RID = unpack('V',substr($adResult[$i]['objectsid'][0],24)); //Extract the relative SID from the SID
$RIDs[] = $RID[1];
$resultSet = array();
foreach($attributes as $attribute){
$resultSet[$attribute] = $adResult[$i][$attribute][0];
}
$result[$RID[1]] = $resultSet;
}
$nextStepGroupRID = $nextStepGroupRID+$searchStep;
}while($adResult['count']>1);
var_dump($result);
The $adConn->search method looks like this:
function search($filter, $attributes = false, $base_dn = null) {
if(!isset($base_dn)){
$base_dn = $this->baseDN;
}
$entries = false;
if (is_string($filter) && $this->bind) {
if (is_array($attributes)) {
$search = ldap_search($this->resource, $base_dn, $filter, $attributes);
} else {
$search = ldap_search($this->resource, $base_dn, $filter);
}
if ($search !== false) {
$entries = ldap_get_entries($this->resource, $search);
}
}
return $entries;
}
Never make assumptions about servers or server configuration, this leads to brittle code and unexpected, sometimes spectacular failures. Just because it is AD today does not mean it will be tomorrow, or that Microsoft will not change the default limit in the server. I recently dealt with a situation where client code was written with the tribal knowledge that the size limit was 2000, and when administrators, for reasons of their own, changed the size limit, the client code failed horribly.
Are you sure that PHP does not support request controls (the simple paged result extension is a request control)? I wrote an article about "LDAP: Simple Paged Results", and though the article sample code is Java, the concepts are important, not the language. See also "LDAP: Programming Practices".
The previous script error may occur when the distance between the nearest SIDs 999 more.
Example:
S-1-5-21-3188256696-111411151-3922474875-1158
S-1-5-21-3188256696-111411151-3922474875-3359
3359-1158 > 999
in order to avoid this you need to use rigid structures
Example:
$tt = '1';
do {
...
$nextStepGroupRID = $nextStepGroupRID - $searchStep;
$tt++;
} while ($tt < '30');
In this example, we are forced to check 999 * 30 * 2 = 59940 values.
I am recording unique page views using memcached and storing them in db at 15 mins interval. Whenever number of users grow memcached gives me following error:
Memcache::get(): Server localhost (tcp 10106) failed with: Failed reading line from stream (0)
I am using following code for insert/update page views in memcached
if($memcached->is_valid_cache("visiors")) {
$log_views = $memcached->get_cache("visiors");
if(!is_array($log_views)) $log_views = array();
}
else {
$log_views = array();
}
$log_views[] = array($page_id, $time, $other_Stuff);
$memcached->set_cache("visiors", $log_views, $cache_expire_time);
Following code retrieves the array from memcached, updates the X number of page views in db and sets the remaining page views in memcached
if($memcached->is_valid_cache("visiors")) {
$log_views = $memcached->get_cache("visiors");
if(is_array($log_views) && count($log_views) > 0) {
$logs = array_slice($log_views, 0, $insert_limit);
$insert_array = array();
foreach($logs as $log) {
$insert_array[] = '('. $log[0]. ',' . $log[1] . ', NOW())';
}
$insert_sql = implode(',',$insert_array);
if(mysql_query('INSERT SQL CODE')) {
$memcached->set_cache("visiors", array_slice($log_views, $insert_limit), $cache_expire_time); //store new values
}
}
}
The insert/update cause thread locking because I can see lots of script in waiting for their turn. I think I am losing page views during the update process. Any suggestions how to avoid memcached reading errors and make this code perfect?
You are likely running into a connection limit within memcached, your firewall, network, etc. We have a simple walk through on the most common scenarios: http://code.google.com/p/memcached/wiki/Timeouts
There's no internal locking that would cause sets or gets to block for any amount of time.