My app first queries 2 large sets of data, then does some work on the first set of data, and "uses" it on the second.
If possible I'd like it to instead only query the first set synchronously and the second asynchronously, do the work on the first set and then wait for the query of the second set to finish if it hasn't already and finally use the first set of data on it.
Is this possible somehow?
It's possible.
$mysqli->query($long_running_sql, MYSQLI_ASYNC);
echo 'run other stuff';
$result = $mysqli->reap_async_query(); //gives result (and blocks script if query is not done)
$resultArray = $result->fetch_assoc();
Or you can use mysqli_poll if you don't want to have a blocking call
http://php.net/manual/en/mysqli.poll.php
MySQL requires that, inside one connection, a query is completely handled before the next query is launched. That includes the fetching of all results.
It is possible, however, to:
fetch results one by one instead of all at once
launch multiple queries by creating multiple connections
By default, PHP will wait until all results are available and then internally (in the mysql driver) fetch all results at once. This is true even when using for example PDOStatement::fetch() to import them in your code one row at a time. When using PDO, this can be prevented with setting attribute \PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false. This is useful for:
speeding up the handling of query results: your code can start processing as soon as one row is found instead of only after every row is found.
working with result sets that are potentially larger than the memory available to PHP (PHP's self-imposed memory_limit or RAM memory).
Be aware that often the speed is limited by a storage system with characteristics that mean that the total processing time for two queries is larger when running them at the same time than when running them one by one.
An example (which can be done completely in MySQL, but for showing the concept...):
$dbConnectionOne = new \PDO('mysql:hostname=localhost;dbname=test', 'user', 'pass');
$dbConnectionOne->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);
$dbConnectionTwo = new \PDO('mysql:hostname=localhost;dbname=test', 'user', 'pass');
$dbConnectionTwo->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);
$dbConnectionTwo->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
$synchStmt = $dbConnectionOne->prepare('SELECT id, name, factor FROM measurementConfiguration');
$synchStmt->execute();
$asynchStmt = $dbConnectionTwo->prepare('SELECT measurementConfiguration_id, timestamp, value FROM hugeMeasurementsTable');
$asynchStmt->execute();
$measurementConfiguration = array();
foreach ($synchStmt->fetchAll() as $synchStmtRow) {
$measurementConfiguration[$synchStmtRow['id']] = array(
'name' => $synchStmtRow['name'],
'factor' => $synchStmtRow['factor']
);
}
while (($asynchStmtRow = $asynchStmt->fetch()) !== false) {
$currentMeasurementConfiguration = $measurementConfiguration[$asynchStmtRow['measurementConfiguration_id']];
echo 'Measurement of sensor ' . $currentMeasurementConfiguration['name'] . ' at ' . $asynchStmtRow['timestamp'] . ' was ' . ($asynchStmtRow['value'] * $currentMeasurementConfiguration['factor']) . PHP_EOL;
}
Related
I have a file that has the function of importing data into a sql database from an api. A problem I encountered was that the api can only retrieve a max dataset size of 1000, even though sometimes I need to retrieve large amounts of data, ranging from 10-200,000. My first thought was to create a while loop in which inside I make calls to the api until all of the data is properly retrieved, and afterwards, can I enter it into the database.
$moreDataToImport = true;
$lastId = null;
$query = '';
while ($moreDataToImport) {
$result = json_decode(callToApi($lastId));
$query .= formatResult($result);
$moreDataToImport = !empty($result['dataNotExported']);
$lastId = getLastId($result['customers']);
}
mysqli_multi_query($con, $query);
The issue I encountered with this is that I was quickly reaching memory limits. The easy solution to this is to simply increase the memory limit until it was suffice. How much memory I needed, however, was undeclared, because there is always a possibility that I need to import very large datasets, and can theoretically always run out of memory. I don't want to set an infinite memory limit, as the problems with that are unimaginable.
My second solution to this was instead of looping through the imported data, I could instead send it to my database, and then do a page refresh, with a get request specifying the last Id I left off on.
if (isset($_GET['lastId'])
$lastId = $_GET['lastId'];
else
$lastId = null;
$result = json_decode(callToApi($lastId));
$query .= formatResult($result);
mysqli_multi_query($con, $query);
if (!empty($result['dataNotExported'])) {
header('Location: ./page.php?lastId='.getLastId($result['customers']));
}
This solution solves my memory limit issue, however now I have another issue, being that browsers, after 20 redirects (depends on the browser), will automatically kill the program to stop a potential redirect loop, then shortly refresh the page. The solution to this would be to kill the program yourself at the 20th redirect and allow it to do a page refresh, continuing the process.
if (isset($_GET['redirects'])) {
$redirects = $_GET['redirects'];
if ($redirects == '20') {
if ($lastId == null) {
header("Location: ./page.php?redirects=2");
}
else {
header("Location: ./page.php?lastId=$lastId&redirects=2");
}
exit;
}
}
else
$redirects = '1';
Though this solves my issues, I am afraid this is more impractical than other solutions, as there must be a better way to do this. Is this, or the issue of possibly running out of memory my only two choices? And if so, is one more efficient/orthodox than the other?
Do the insert query inside the loop that fetches each page from the API, rather than concatenating all the queries.
$moreDataToImport = true;
$lastId = null;
$query = '';
while ($moreDataToImport) {
$result = json_decode(callToApi($lastId));
$query = formatResult($result);
mysqli_query($con, $query);
$moreDataToImport = !empty($result['dataNotExported']);
$lastId = getLastId($result['customers']);
}
Page your work. Break it up into smaller chunks that will be below your memory limit.
If the API only returns 1000 at a time, then only process 1000 at a time in a loop. In each iteration of the loop you'll query the API, process the data, and store it. Then, on the next iteration, you'll be using the same variables so your memory won't skyrocket.
A couple things to consider:
If this becomes a long running script, you may hit the default script running time limit - so you'll have to extend that with set_time_limit().
Some browsers will consider scripts that run too long to be timed out and will show the appropriate error message.
For processing upwards of 200,000 pieces of data from an API, I think the best solution is to not make this work dependant on a page load. If possible, I'd put this in a cron job to be run by the server on a regular schedule.
If the dataset is dependant on the request (for example, if you're processing temperatures from one of 1000s of weather stations - the specific station ID to be set by the user), then consider creating a secondary script that does the work. Calling and forking the secondary script from your primary script will enable your primary script to finish execution while your secondary script executes in the background on your server. Something like:
exec('php path/to/secondary-script.php > /dev/null &');
I've been searching for a suitable PHP caching method for MSSQL results.
Most of the examples I can find suggest storing the results in an array, which would then get included to page. This seems great unless a request for the content was made at the same time as it being updated/rebuilt.
I was hoping to find something similar to ASP's application level variables, but far as I'm aware, PHP doesn't offer this functionality?
The problem I'm facing is I need to perform 6 queries on page to populate dropdown boxes. This happens on the vast majority of pages. It's also not an option to combine the queries. The cached data will also need to be rebuilt sporadically, when the system changes. This could be once a day, once a week or a month. Any advice will be greatly received, thanks!
You can use Redis server and phpredis PHP extension to cache results fetched from database:
$redis = new Redis();
$redis->connect('/tmp/redis.sock');
$sql = "SELECT something FROM sometable WHERE condition";
$sql_hash = md5($sql);
$redis_key = "dbcache:${sql_hash}";
$ttl = 3600; // values expire in 1 hour
if ($result = $redis->get($redis_key)) {
$result = json_decode($result, true);
} else {
$result = Db::fetchArray($sql);
$redis->setex($redis_key, $ttl, json_encode($result));
}
(Error checks skipped for clarity)
I am trying to run a query in BigQuery/PHP (using google php SDK) that returns a large dataset (can be 100,000 - 10,000,000 rows).
$bigqueryService = new Google_BigqueryService($client);
$query = new Google_QueryRequest();
$query->setQuery(...);
$jobs = $bigqueryService->jobs;
$response = $jobs->query($project_id, $query);
//query is a syncronous function that returns a full dataset
The next step is to allow the user to download the result as a CSV file.
The code above will fail when the dataset becomes too large (memory limit).
What are my options to perform this operation with lower memory usage ?
(I figured an option is to save the results to another table with BigQuery and then start doing partial fetch with LIMIT and OFFSET but I figured a better solution might be available..)
Thanks for the help
You can export your data directly from Bigquery
https://developers.google.com/bigquery/exporting-data-from-bigquery
You can use PHP to run a API call that does the export (you dont need the BQ tool)
You need to set the jobs configuration.extract.destinationFormat see the reference
Just to elaborate on Pentium10's answer
You can export up to a 1GB file in json format.
Then you can read the file line by line which will minimize the memory used by your application and then you can use json_decode the information.
The suggestion to export is a good one, I just wanted to mention there is another way.
The query API you are calling (jobs.query()) does not return the full dataset; it just returns a page of data, which is the first 2 MB of the results. You can set the maxResults flag (described here) to limit this to a certain number of rows.
If you get back fewer rows than are in the table, you will get a pageToken field in the response. You can then fetch the remainder with the jobs.getQueryResults() API by providing the job ID (also in the query response) and the page token. This will continue to return new rows and a new page token until you get to the end of your table.
The example here shows code (in java in python) to run a query and fetch the results page by page.
There is also an option in the API to convert directly to CSV by specifying alt='csv' in the URL query string, but I'm not sure how to do this in PHP.
I am not sure do you still using the PHP but the answer is:
$options = [
'maxResults' => 1000,
'startIndex' => 0
];
$jobConfig = $bigQuery->query($query);
$queryResults = $bigQuery->runQuery($jobConfig, $options);
foreach ($queryResults as $row) {
// Handle rows
}
I'm experiencing a strange problem. I'm caching the output of a query using memcache functions in a file named count.php. This file is called by an ajax every second when a user is viewing a particular page. The output is cached for 5 seconds, so within this time if there will be 5 hits to this file i expect the cached result to be returned 3-4 times atleast. However this is not happening, instead everytime a query is going to db as evidenced from a echo statement, but if the file is called from the browser directly by typing the url (like http://example.com/help/count.php) repeatedly many times within 5 seconds data is returned from cache (again evidenced from the echo statement). Below is the relevant code of count.php
mysql_connect(c_dbhost, c_dbuname, c_dbpsw) or die(mysql_error());
mysql_select_db(c_dbname) or die("Coud Not Find Database");
$product_id=$_POST['product_id'];
echo func_total_bids_count($product_id);
function func_total_bids_count($product_id)
{
$qry="select count(*) as bid_count from tbl_userbid where userbid_auction_id=".$product_id;
$row_count=func_row_count_only($qry);
return $row_count["bid_count"];
}
function func_row_count_only($qry)
{
if($_SERVER["HTTP_HOST"]!="localhost")
{
$o_cache = new Memcache;
$o_cache->connect('localhost', 11211) or die ("Could not connect to memcache");
//$key="total_bids" . md5($product_id);
$key = "KEY" . md5($qry);
$result = $o_cache->get($key);
if (!$result)
{
$qry_result = mysql_query($qry);
while($row=mysql_fetch_array($qry_result))
{
$row_count = $row;
$result = $row;
$o_cache->set($key, $result, 0, 5);
}
echo "From DB <br/>";
}
else
{
echo "From Cache <br/>";
}
$o_cache->close();
return $row_count;
}
}
I'm confused as to why when an ajax calls this file, DB is hit every second, but when the URL is typed in the browser cached data is returned. To try the URL method i just replaced $product_id with a valid number (Eg: $product_id=426 in my case). I'm not understanding whats wrong here as i expect data to be returned from cache within 5 seconds after the 1st hit. I want the data to be returned from cache. Can some one please help me understand whats happening ?
If you're using the address bar, you're doing a GET, but your code is looking for $_POST['...'], so you will end up with an invalid query. So for a start, the results using the address bar won't be what you're expecting. Is your Ajax call actually doing a POST?
Please also note that you've got a SQL injection vulnerability there. Make sure $product_id is an integer.
There are many problems with your code, first of all you always connect to the database and select a table, even if you don't need it. Second, you should check $result with !empty($result) which is more reliable as just !$result, because it's also covers empty objects.
As above noted, if the 'product_id' is not in the $_POST array, you could use $_REQUEST to also cover $_GET (but you shouldn't, if you are certain it's coming via $_POST).
I am using TCPDF to create PDF documents on the fly. Some of the queries to generate these PDFs contain over 1,000+ records and my server is timing out with larger queries (Internal Server Error). I am using PHP and MySQL.
How do I parse a large MySQL query with AJAX into smaller chunks, cache the data, and the recombine the results, in order to prevent the server from timing out?
Here is my current code:
require_once('../../libraries/tcpdf/tcpdf.php');
$pdf = new TCPDF();
$prows = fetch_data($id);
$filename = '../../pdf_template.php';
foreach ($prows AS $row) {
$pdf->AddPage('P', 'Letter');
ob_start();
require($filename);
$html .= ob_get_contents();
ob_end_clean();
$pdf->writeHTML($html, true, false, true, false, '')
}
$pdf->Output('documents.pdf', 'D');
If it is absolutely imperative that the data be generated every time that this gets called, you could chunk the query into blocks of 100 rows, caching the data on the filesystem (or, much better, in APC/WinCache/XCache/MemCache) then on the final AJAX call, (Finalquery=true), loading the cache, generating the PDF, and clearing the cache.
I'm assuming that you can't change the max run time of the script like <?php set_time_limit(0);
The flow would go like this:
Generate a unique ID for the query action
on the client side, get the maximum number of rows through the initial call to the php script (select count...etc).
Then, divide it by 10, 100, 1000, however many requests you want to separate it into.
Run an AJAX request for each one of these chunks (rows 0, 100, 100,200) with the unique id.
On the server side, you select that number/selection of rows, and put it into a store somewhere. I'm familiar with WinCache (I'm stuck developing PHP on Windows...), so add it to the usercache: wincache_ucache_add( $uniqueId . '.chunk' . $chunkNumber, $rows);
In the meantime, on the client side, track all the AJAX requests. When each one is finished, make a call to the generate PDF function serverside. Give the number of chunks and the unique id.
On the server side, grab everything from the cache. $rows = array_merge( wincache_ucache_get($uniqueId . '.chunk'. $chunkNumber1), wincache_ucache_get($uniqueId . '.chunk'. $chunkNumber2) );
Clear the cache: wincache_ucache_clear();
Generate the PDF and return it.
Note that you really should just cache the query results in general, either manually or through functionality in an abstraction layer or ORM.
Try a simple cache like PEAR cache_lite. During a slow or off period, you can create a utility program to run and cache all the records you would need. Then when the user requests the PDF you can access your cache data without having to hit your database. In addition, you can just use the timestamp field on your table and request database records that have changed since the data was cached, this will significantly lower the current record count you are having to retrieve from the database in real time.
<?php
include 'Cache/Lite/Output.php';
$options = array(
'cacheDir' => '/tmp/cache/',
'lifeTime' => 3600,
'pearErrorMode' => CACHE_LITE_ERROR_DIE,
'automaticSerialization' => true
);
$cache = new Cache_Lite_Output($options);
if( $data_array = $cache->get('some_data_id') ) {
$prows = $data_array;
} else {
$prows = fetch_data($id);
$cache->save($prows);
}
print_r($prows);
?>