Related
I usually work with web hosting companies but I decided to start learning working with servers to expand my knowledge.
I'll better give a real example to explain my question the best:
I have a web application that gathers data from a slow API that returns JSON data of products.
I have a function running every 1AM running a lot of queries on "id"s in my database.
Crontab:
0 1 * * * cd /var/www/html/tools; php index.php aso Cli_kas kas_alert
So this creates a process for the app (please correct me here if I'm wrong) and each process creates threads, and just to be more accurate, they are multi-threads since they do more than one thing: like pulling data from the DB to get the right variables and string them to the API queries, getting the data from the API, organizing it, searching the relevant data, and then inserting new data to the database.
The main PHP functions:
// MAIN: Cron Job Function
public function kas_alert() {
// 0. Deletes all the saved data from the `data` table 1 month+ ago.
// $this->kas_model->clean_old_rows();
// 1. Get 'prod' table
$data['table'] = $this->kas_model->prod_table();
// 2. Go through each row -
foreach ( $data['table'] as $row ) {
// 2.2. Gets all vars from the first query.
$last_row_query = $this->kas_model->get_last_row_of_tag($row->tag_id);
$last_row = $last_row_query[0];
$l_aaa_id = $last_row->prod_aaa_id;
$l_and_id = $last_row->prod_bbb_id;
$l_r_aaa = $last_row->dat_data1_aaa;
$l_r_and = $last_row->dat_data1_bbb;
$l_t_aaa = $last_row->dat_data2_aaa;
$l_t_and = $last_row->dat_data2_bbb;
$tagword = $last_row->tag_word;
$tag_id = $last_row->tag_id;
$country = $last_row->kay_country;
$email = $last_row->u_email;
$prod_name = $last_row->prod_name;
// For the Weekly report:
$prod_id = $last_row->prod_id;
$today = date('Y-m-d');
// 2.3. Run the tagword query again for today on each one of the tags and insert to DB.
if ( ($l_aaa_id != 0) || ( !empty($l_aaa_id) ) ) {
$aaa_data_today = $this->get_data1_aaa_by_id_and_kw($l_aaa_id, $tagword, $country);
} else{
$aaa_data_today['data1'] = 0;
$aaa_data_today['data2'] = 0;
$aaa_data_today['data3'] = 0;
}
if ( ($l_and_id != 0) || ( !empty($l_and_id) ) ) {
$bbb_data_today = $this->get_data1_bbb_by_id_and_kw($l_and_id, $tagword, $country);
} else {
$bbb_data_today['data1'] = 0;
$bbb_data_today['data2'] = 0;
$bbb_data_today['data3'] = 0;
}
// 2.4. Insert the new variables to the "data" table.
if ($this->kas_model->insert_new_tag_to_db( $tag_id, $aaa_data_today['data1'], $bbb_data_today['data1'], $aaa_data_today['data2'], $bbb_data_today['data2'], $aaa_data_today['data3'], $bbb_data_today['data3']) ){
}
// Kas Alert Outputs ($SEND is echoed in it's original function)
echo "<h1>prod Name: $prod_id</h1>";
echo "<h2>tag id: $tag_id</h2>";
var_dump($aaa_data_today);
echo "aaa old: ";
echo $l_r_aaa;
echo "<br> aaa new: ";
echo $aaa_data_today['data1'];
var_dump($bbb_data_today);
echo "<br> bbb old: ";
echo $l_r_and;
echo "<br> bbb new: ";
echo $bbb_data_today['data1'];
// 2.5. Check if there is a need to send something
$send = $this->check_if_send($l_aaa_id, $l_and_id, $l_r_aaa, $aaa_data_today['data1'], $l_r_and, $bbb_data_today['data1']);
// 2.6. If there is a trigger, send the email!
if ($send) {
$this->send_mail($l_aaa_id, $l_and_id, $aaa_data_today['data1'], $bbb_data_today['data1'], $l_r_aaa, $l_r_and, $tagword, $email, $prod_name);
}
}
}
For #Raptor, this is the function that get's the API data:
// aaa tag Query
// Gets aaa prod dataing by ID.
public function get_data_aaa_by_id_and_tg($id, $tag, $query_country){
$tag_for_url = rawurlencode($tag);
$found = FALSE;
$i = 0;
$data = array();
// Create a stream for Json. That's how the code knows what to expect to get.
$context_opts = array(
'http' => array(
'method' => "GET",
'header' => "Accepts: application/json\r\n"
));
$context = stream_context_create($context_opts);
while ($found == FALSE) {
// aaa Query
$json_query_aaa = "https://api.example.com:443/aaa/ajax/research_tag?app_id=$id&term=$tag_for_url&page_index=$i&country=$query_country&auth_token=666";
// Get the Json
$json_query_aaa = file_get_contents($json_query_aaa, false, $context);
// Turn Json to a PHP array
$json_query_aaa = json_decode($json_query_aaa, true);
// Get the data2
$data2 = $json_query_aaa['tag']['data2'];
if (is_null($data2)){ $data2 = 0; }
// Get data3
$data3 = $json_query_aaa['tag']['phone_prod']['data3'];
if (is_null($data3)){ $data3 = 0; }
// Finally, the main prod array.
$json_query_aaa = $json_query_aaa['tag']['phone_prod']['app_list'];
if ( count($json_query_aaa) > 2 ) {
for ( $j=0; $j<count($json_query_aaa); $j++ ) {
if ( $json_query_aaa[$j]['id'] == $id ) {
$found = TRUE;
$data = $json_query_aaa[$j]['data'] + 1;
break;
}
if ($found == TRUE){
break;
}
}
$i++;
} else {
$data = 0;
break;
}
}
$data['data1'] = $data;
$data['data2'] = $data2;
$data['data3'] = $data3;
return $data;
}
All threads are stacked one after an other, and when one thread is done, only then - the second thread can proceed, ect'.
And in technical view on this, all threads wait in the RAM until the one before them is done working "inside" the CPU. (correct me if I'm wrong again :] )
This doesn't even "tickle" the servers RAM or CPU when looking at it in the process manager (I use "htop"). RAM is at 400M/4.25G and CPU at ONLY 0.7%-1.3%.
Making me feel this isn't the best I can get from my current server, and getting slow results from my web app.
How do I get things done in a way that all threads work in parallel, but not to a point that my app crashes due to lacks of CPU or RAM?
I'm using a script from this site. This script works fine for me and it does what its need to do but I have one problem. When a track finishes on my Icecast server it doesn't get updates on the site. So if my song is 'Stole the show' than it says 'Stole the show' the page but when the song finished and e.g. 'Thinking out loud' starts the page still says 'Stole the show' on a refresh it will update. But how to make it so the page auto updates itself so the users doesn't have to refresh manually?
PHP
<?php
// include the class file
include( 'icecast.php' );
// instantiate class
$stream = new IceCast();
// set server and mount
$server = 'http://radio.finioxfm.com:8000';
$file = '/status.xsl';
// set the url
$stream->setUrl($server,$file);
// get status info
$radio = $stream->getStatus();
// assign array to variables
extract($radio);
// echo the status
echo $status.'<br/>';
// display more stats if ON AIR
if ($status=='ON AIR') :
echo $listeners.' listeners<br/>';
echo $title.'<br/>';
echo $genre.'<br/>';
for ($i=0; $i < 1; $i++) {
echo $now_playing['artist'].'<br/>';
echo $now_playing['track'].'<br/>';
}
endif;
?>
icecast.php script
<?php
class IceCast {
var $server = "http://radio.finioxfm.com:8000";
var $stats_file = "/status.xsl";
var $radio_info=array();
function __construct() {
// build array to store our Icecast stats
$this->radio_info['server'] = $this->server;
$this->radio_info['title'] = '';
$this->radio_info['description'] = '';
$this->radio_info['content_type'] = '';
$this->radio_info['mount_start'] = '';
$this->radio_info['bit_rate'] = '';
$this->radio_info['listeners'] = '';
$this->radio_info['most_listeners'] = '';
$this->radio_info['genre'] = '';
$this->radio_info['url'] = '';
$this->radio_info['now_playing'] = array();
$this->radio_info['now_playing']['artist'] = 'Unknown';
$this->radio_info['now_playing']['track'] = 'Unknown';
$this->radio_info['status'] = 'OFF AIR';
}
function setUrl($url,$file) {
$this->server=$url;
$this->stats_file=$file;
$this->radio_info['server'] = $this->server;
}
private function fetch() {
// create a new curl resource
$ch = curl_init();
// set the url
curl_setopt($ch,CURLOPT_URL,$this->server.$this->stats_file);
// return as a string
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
// $output = the status.xsl file
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
return $output;
}
function getStatus() {
$output=$this->fetch();
// loop through $output and sort arrays
$temp_array = array();
$search_for = "<td\s[^>]*class=\"streamdata\">(.*)<\/td>";
$search_td = array('<td class="streamdata">','</td>');
if(preg_match_all("/$search_for/siU",$output,$matches)) {
foreach($matches[0] as $match) {
$to_push = str_replace($search_td,'',$match);
$to_push = trim($to_push);
array_push($temp_array,$to_push);
}
}
if(count($temp_array)) {
//sort our temp array into our ral array
$this->radio_info['title'] = $temp_array[0];
$this->radio_info['description'] = $temp_array[1];
$this->radio_info['content_type'] = $temp_array[2];
$this->radio_info['mount_start'] = $temp_array[3];
$this->radio_info['bit_rate'] = $temp_array[4];
$this->radio_info['listeners'] = $temp_array[5];
$this->radio_info['most_listeners'] = $temp_array[6];
$this->radio_info['genre'] = $temp_array[7];
$this->radio_info['url'] = $temp_array[8];
if(isset($temp_array[9])) {
$x = explode(" - ",$temp_array[9]);
$this->radio_info['now_playing']['artist'] = $x[0];
$this->radio_info['now_playing']['track'] = $x[1];
}
$this->radio_info['status'] = 'ON AIR';
}
return $this->radio_info;
}
}
?>
First of all, I have to point out that you shouldn't use this script. It works by parsing the Icecast Status page, which we highly discourage, as it may change. For example in Icecast 2.4 we re-made the complete web interface, so chances are that this script breaks.
You should actually parse the XML Icecast provides at http://icecast.tld:8000/admin/stats. It contains everything you need. If you can't access Icecast's Admin page for some reason, you can use the JSON at http://icecast.tld:8000/status-json.xsl, which is there since Icecast 2.4 exactly for the purpose you describe.
To get the site display new metadata information without refreshing, you need to use an AJAX call which either loads directly the status-json.xsl and extracts the metadata and updates it on the page, or if you use the admin XML you need to write a PHP script which returns json, that you can fetch via AJAX and update accordingly.
A lot of people in the past have spoken about setting up node.js (if you have a server doing your streaming).
Personally I have gone with a jquery solution; which just compares the last fetched data with the live data every 10 seconds. That way it loads in almost 'real time'.
You can find my solution here broken down here http://www.radiodj.ro/community/index.php?topic=7471.0
I am new to php and SQL.
I have a database with 45 000 rows, I need to extract all 'udid' attribute, so there is a lot of data. I put the result in an array, for each udid, I check if udid.xml exists, if the file exist i open it and search for some data, then i reorder the array..While this process I get a timeout error, when I call my script or a 504 error from my internet provider.
Here is my php code :
include("myFunctions.php");
header('Content-type: application/json; charset=utf-8');
$handle = fopen('php://input','r');
// ----------
set_time_limit (0);
$req_Push = $db->sql_query("SELECT * FROM utilisateurs");
echo("COUNT = " .$db->sql_numrows($req_Push). "");
$arrayBigTbl = array();
while ($row_Push = $db->sql_fetchrow($req_Push)){ // while udid ref fichier utilisateurs
$idUser = $row_Push['udid'];
$arrayBigTbl []= $idUser;
}
//print_r($arrayBigTbl);
echo("</br>Count array global = " .count($arrayBigTbl). "--");
$arrayBigTbl = array_unique($arrayBigTbl);
echo("</br>Count array global nettoyée = " .count($arrayBigTbl). " -- Fin2");
$arrayXMLDevice = array();
for ($i = 0; $i < count($arrayBigTbl) ; $i++)
{
if (deviceXMLExist($arrayBigTbl[$i]))
{
$UDIDFile = simplexml_load_file("./UDID/".$arrayBigTbl[$i].".xml");
foreach($UDIDFile->UDID as $item)
{
$valueTemp = $item->valueUDID;
$strig = "" .$valueTemp. "";
$arrayXMLDevice[] = $strig;
}
}
}
//print_r($arrayXMLDevice);
echo("COUNT ARRAY DEVICE XML = " .count($arrayXMLDevice). " --- Fin 3");
$arrayXMLDevice = array_unique($arrayXMLDevice);
echo("COUNT ARRAY DEVICE XML = " .count($arrayXMLDevice). " --- Fin 3");
$arrayXMLDevice = array_values($arrayXMLDevice);
$arraymerge = array_merge ( $arrayBigTbl,$arrayXMLDevice );
echo("COUNT ARRAY MERGE = " .count($arraymerge). " --- </br>");
$arraymerge = array_unique($arraymerge);
echo("COUNT ARRAY MERGE NETTOYEE = " .count($arraymerge). " --- Fin 3");
$arrayFinalPush = array();
for ($i = 0; i < count($arraymerge);$i=$i+1)
{
$aValue = $arraymerge[i];
if(ctype_xdigit($deviceToken) && strlen($deviceToken)>60)
{
$arrayFinalPush[] = $aValue;
}
}
echo("COUNT ARRAY FINAL = " .count($arrayFinalPush). " ---");
How can I resolve this problem so my request gets finished?
I am not at all surprised you get a timeout error you are doing things very inefficiently.
That being said one trick I use with php is to flush the output buffer: How to flush output after each `echo` call?
If you flush the output buffer after each echo then it will give the browser some data and stop it from timing out.
The reason I say it is inefficient is because you generally want to do as much "work" as possible with SQL - things like uniqueness, sorting ordering etc. should be done in SQL rather than PHP whenever possible because SQL is built for it.
Because Server costs are the greatest spending, we want to get more out of everyone.
How can we achieve, that more scripts can run on this server?
What the Scrips are doing:
We run 80 PHP Scripts on one Server and feed them with Jobs through Gearman.
The Scripts are looking up Websites with cURL, extract the needed Informations with Zend_Dom_Query and store the Data in an DB.
Each Script get feeded with ca. 1000 URLs which they have to look up.
Script example is down below.
What's the Server made of:
lshws outpu:
description: Computer
width: 64 bits
capabilities: vsyscall64 vsyscall32
*-core
description: Motherboard
physical id: 0
*-memory
description: System memory
physical id: 0
size: 8191GiB
*-cpu
product: Intel(R) Xeon(R) CPU E31230 # 3.20GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu#0
width: 64 bits
capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp x86-64 constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts tpr_shadow vnmi flexpriority ept vpid
Nevertheless this is an V-Server it's the only V-Server running on that server. It has also not 8191GB Memory more like 16GB.
To show you how exhausted the server is, here's tops output:
top - 14:45:04 up 8 days, 3:10, 1 user, load average: 72.96, 72.51, 71.82
Tasks: 100 total, 72 running, 28 sleeping, 0 stopped, 0 zombie
Cpu(s): 87.5%us, 12.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 8589934588k total, 4349016k used, 8585585572k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 282516k cached
Not to forget here's the scripts main structure:
// Get the Infos on which to crawl on
$asin = explode(',', $job->workload());
try {
$userproducts = new App_Dbservices_U...();
$konkurrenz = new App_Dbservices_K...();
$repricingstream = new App_Dbservices_R...();
$err = 0;
for ($i = 0; $i < count($asin) - 3; $i = $i + 50) {
$mh = curl_multi_init();
$handles = array();
for ($j = $i; $j < $i + 50; $j++) {
if ((count($asin) - 3) > $j) {
if (isset($asin[$j])) {
// create a new single curl handle
$ch = curl_init();
// setting several options like url, timeout, returntransfer
// simulate multithreading by calling the wait.php scipt and sleeping for $rand seconds
$url = // URL
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 80);
// add this handle to the multi handle
$erroro[$j] = curl_errno($ch);
$errmsg[$j] = curl_error($ch);
curl_multi_add_handle($mh, $ch);
// put the handles in an array to loop this later on
$handles[] = $ch;
}
}
}
}
// execute the multi handle
$running = null;
do {
curl_multi_exec($mh, $running);
} while ($running > 0);
// get the content (if there is any)
$output = '';
for ($k = 0; $k < count($handles); $k++) {
// get the content of the handle
$output[$k] = curl_multi_getcontent($handles[$k]);
$_asin[$k]['asin'] = $asin[$j - 50 + $k];
$_asin[$k]['condition'] = $condition[$j - 50 + $k];
$_asin[$k]['pId'] = $pId[$j - 50 + $k];
if ($output[$k] != '')
{
// get the dom of each page
$dom = new Zend_Dom_Query($output[$k]);
// get the sellerInfos of each page
$seller = $dom->query('div.Offer');
if (count($seller) > 0) {
// get the price out of the string
$seller_i = 0;
$selfCameOver = false;
foreach ($seller as $d2) {
if ($seller_i <= 6 OR $selfCameOver === false) {
$itemHtml = '';
foreach($d2->childNodes as $node) {
$itemHtml .= $node->ownerDocument->saveHTML($node);
}
$dom = new Zend_Dom_Query($itemHtml);
$itemPrice = $dom->query('span.Price');
foreach($itemPrice as $ItemPrice)
{
$_asin[$k]['price_end'][$seller_i] = 0.00;
$_asin[$k]['shipping_end'][$seller_i] = 0.00;
if (preg_match('/[0-9]++(?>[,.][0-9]+)?+/', $ItemPrice->textContent, $rueckgabe)) {
$priceEnd = str_replace(',', '', str_replace('.', '', $rueckgabe[0][0]));
$priceLength = strlen($priceEnd);
$priceEnd = substr($priceEnd, 0, ($priceLength - 2)) . '.' . substr($priceEnd, ($priceLength - 2), 2);
$_asin[$k]['price_end'][$seller_i] = (float)$priceEnd;
}
}
}
$shippingPrice = $dom->query('span.ShippingPrice');
foreach($shippingPrice as $ShippingPrice)
{
preg_match_all('/[0-9]{1,}([\,\. ]?[0-9])*/', $ShippingPrice->textContent, $rueckgabe);
if (isset($rueckgabe[0][0])) {
// ...
}
}
$_asin[$k]['price_total_end'][$seller_i] = $_asin[$k]['price_end'][$seller_i] + $_asin[$k]['shipping_end'][$seller_i];
$conditionTag = $dom->query('.Condition');
foreach($conditionTag as $ConditionTag)
{
$_asin[$k]['main_con'][$seller_i]= 0;
$_asin[$k]['sub_con'][$seller_i] = 0;
$conditionValue = explode(' - ', $ConditionTag->textContent);
if(isset($conditionValue[0])){
// ...
}
if(isset($conditionValue[1])) {
// ...
}
}
$ratingItem = $dom->query('.Rating');
$_asin[$k]['bewertung_end'][$seller_i] = -1;
$_asin[$k]['stars_end'][$seller_i] = -1;
foreach($ratingItem as $RatingItem)
{
echo $RatingItem->textContent; // 99% positiv ... 12 Monaten ... 11.719 Bewertungen ...
// I want to get 99 (which is stars ) and 11719 (which is bewertungen )
preg_match_all('/[0-9]{1,}([\,\. ]?[0-9])*/', preg_replace('/,/', '.', $RatingItem->textContent), $rueckgabe);
if (isset($rueckgabe[0]) AND count($rueckgabe[0]) > 0) {
$_asin[$k]['bewertung_end'][$seller_i] = (int)str_replace('.', '', $rueckgabe[0][count($rueckgabe[0]) - 1]);
$_asin[$k]['stars_end'][$seller_i] = $rueckgabe[0][0];
}
}
$sellerType = $dom->query('.Name img');
$_asin[$k]['merchant_end'][$seller_i] = "N/A";
$_asin[$k]['name_end'][$seller_i] = "N/A";
$_asin[$k]['img_end'][$seller_i] = "N/A";
$_asin[$k]['konk_type'][$seller_i] = 'ERROR';
if(count($sellerType) == 1)
{
foreach($sellerType as $SellerType)
{
$imgAltText = $SellerType->getAttribute('alt');
$a = explode('.', $imgAltText);
// ...
}
}
elseif(count($sellerType) == 0)
{
$_asin[$k]['img_end'][$seller_i] = 'NO_IMG';
$_asin[$k]['konk_type'][$seller_i] = 'WO_IMG';
$sellerName = $dom->query('.Name b');
foreach($sellerName as $SellerName)
{
$_asin[$k]['name_end'][$seller_i] = $SellerName->textContent;
}
$sellerMerchant = $dom->query('.Name a');
foreach($sellerMerchant as $SellerMerchant)
{
$_asin[$k]['merchant_end'][$seller_i] = str_replace('=', '', substr($SellerMerchant->getAttribute('href'), -14));
}
}
unset($rueckgabe);
}
$seller_i++;
}
}
}
// remove the handle from the multi handle
curl_multi_remove_handle($mh, $handles[$k]);
}
// Update Price ...
// Update Shipping ...
// Update Conc ...
unset($_asin);
// close the multi curl handle to free system resources
curl_multi_close($mh);
}
} catch (Exception $e) {
$error = new Repricing_Dbservices_Error();
$error->setError($id, $e->getMessage(), $e->getLine(), $e->getFile());
}
And also the script for the price update ( the other update-statements looks similiar)
$this->db->beginTransaction();
try {
for ($i = 0; $i < count($asin); $i++) {
if (isset($asin[$i]['price_total_end'])) {
if (count($asin[$i]['price_total_end']) > 1) {
if ($asin[$i]['price_total_end'][0] > 0) {
$this->db->query("UPDATE u... SET lowest_price = ? , last_lowest_price_update = ? WHERE id = ?", array(
$asin[$i]['price_total_end'][1],
date("Y-m-d H:i:s", time()),
$asin[$i]['pId']
));
}
} elseif (count($asin[$i]['price_total_end']) == 1) {
if ($asin[$i]['price_total_end'][0] >= 0) {
$this->db->query("UPDATE u... SET lowest_price = ? , last_lowest_price_update = ? WHERE id = ?", array(
-1,
date("Y-m-d H:i:s", time()),
$asin[$i]['pId']
));
}
}
}
}
$this->db->commit();
} catch (Exception $e) {
$this->db->rollBack();
echo $e->getMessage();
}
$this->db->closeConnection();
Do we have an big performance leak in our script, should we go along with an other language, or any other techniques? Every suggestion is highly appreciated.
You can replace all these kind of lines:
preg_match_all('/[0-9]{1,}([\,\. ]?[0-9])*/', $ItemPrice->textContent, $rueckgabe);
if (isset($rueckgabe[0])) {
// ...
}
by:
if (preg_match('/([0-9]++)(?>[.,]([0-9]++))?+/', $ItemPrice->textContent, $rueckgabe)) {
unset($rueckgabe[0]);
$priceEnd = sprintf("%01.2f", implode('.', $rueckgabe));
$_asin[$k]['price_end'][$seller_i] = $priceEnd;
}
You should replace all your for loops with foreach (then you avoid the count on each loop as RaymondN notices it). Example:
Instead of:
for ($k = 0; $k < count($handles); $k++) {
you write:
foreach($handles as $k=>$handle) {
// you can replace $handles[$k] by $handle
It is not very useful to convert the current datetime and format it in "Y-m-d H:i:s" since you can do directly the same with the mySQL statement NOW().
Don´t use count functions in the for loops that would save some CPU cycles ..
but use.
$m = count($array) - 1;
for ($i = 0 ; $i < $m ; $i++) {}
Whats the PHP version a new PHP version could possible perform much better.
The most notable thing here is that you are sharding the data too much - when your load average is higher than the number of CPUs, then the OS will start pre-emppting jobs instead of waiting for them to yeild the CPU. As a result overall throughput drops significantly. You seem to have a single CPU - for a cpu bound system like this running on a single core, I'd try 2, 4, 8 and 16 processes to see which gives the optimal behaviour (assuming you need to make the code fit the hardware rather than the other way around).
Your next problem is that the zend framework is very cpu and memory hungry: Zend is 4 times slower than unadorned PHP.
You've got a lot of inlined code in the loops here - while inlining does help performance it makes it a lot more difficult to get useful data out of a profiler - hence my next step after making the Zend-free and reducing the concurrency would be to structure the code into functions and profile it.
I have to convert .DBF and .FPT files from Visual FoxPro to MySQL. Right now my script works for .DBF files, it opens and reads them with dbase_open() and dbase_get_record_with_names() and then executes the MySQL INSERT commands.
However, some fields of these .DBF files are of type MEMO and therefore stored in a separate files ending in .FPT. How do I read this file?
I have found the specifications of this filetype in MSDN, but I don't know how I can read this file byte-wise with PHP (also, I would really prefer a simplier solution).
Any ideas?
Alright, I have carefully studied the MSDN specifications of DBF and FPT file structures and the outcome is a beautiful PHP class which can open a DBF and (optional) an FPT memo file at the same time. This class will give you record after record and thereby fetch any memos from the memo file - if opened.
class Prodigy_DBF {
private $Filename, $DB_Type, $DB_Update, $DB_Records, $DB_FirstData, $DB_RecordLength, $DB_Flags, $DB_CodePageMark, $DB_Fields, $FileHandle, $FileOpened;
private $Memo_Handle, $Memo_Opened, $Memo_BlockSize;
private function Initialize() {
if($this->FileOpened) {
fclose($this->FileHandle);
}
if($this->Memo_Opened) {
fclose($this->Memo_Handle);
}
$this->FileOpened = false;
$this->FileHandle = NULL;
$this->Filename = NULL;
$this->DB_Type = NULL;
$this->DB_Update = NULL;
$this->DB_Records = NULL;
$this->DB_FirstData = NULL;
$this->DB_RecordLength = NULL;
$this->DB_CodePageMark = NULL;
$this->DB_Flags = NULL;
$this->DB_Fields = array();
$this->Memo_Handle = NULL;
$this->Memo_Opened = false;
$this->Memo_BlockSize = NULL;
}
public function __construct($Filename, $MemoFilename = NULL) {
$this->Prodigy_DBF($Filename, $MemoFilename);
}
public function Prodigy_DBF($Filename, $MemoFilename = NULL) {
$this->Initialize();
$this->OpenDatabase($Filename, $MemoFilename);
}
public function OpenDatabase($Filename, $MemoFilename = NULL) {
$Return = false;
$this->Initialize();
$this->FileHandle = fopen($Filename, "r");
if($this->FileHandle) {
// DB Open, reading headers
$this->DB_Type = dechex(ord(fread($this->FileHandle, 1)));
$LUPD = fread($this->FileHandle, 3);
$this->DB_Update = ord($LUPD[0])."/".ord($LUPD[1])."/".ord($LUPD[2]);
$Rec = unpack("V", fread($this->FileHandle, 4));
$this->DB_Records = $Rec[1];
$Pos = fread($this->FileHandle, 2);
$this->DB_FirstData = (ord($Pos[0]) + ord($Pos[1]) * 256);
$Len = fread($this->FileHandle, 2);
$this->DB_RecordLength = (ord($Len[0]) + ord($Len[1]) * 256);
fseek($this->FileHandle, 28); // Ignoring "reserved" bytes, jumping to table flags
$this->DB_Flags = dechex(ord(fread($this->FileHandle, 1)));
$this->DB_CodePageMark = ord(fread($this->FileHandle, 1));
fseek($this->FileHandle, 2, SEEK_CUR); // Ignoring next 2 "reserved" bytes
// Now reading field captions and attributes
while(!feof($this->FileHandle)) {
// Checking for end of header
if(ord(fread($this->FileHandle, 1)) == 13) {
break; // End of header!
} else {
// Go back
fseek($this->FileHandle, -1, SEEK_CUR);
}
$Field["Name"] = trim(fread($this->FileHandle, 11));
$Field["Type"] = fread($this->FileHandle, 1);
fseek($this->FileHandle, 4, SEEK_CUR); // Skipping attribute "displacement"
$Field["Size"] = ord(fread($this->FileHandle, 1));
fseek($this->FileHandle, 15, SEEK_CUR); // Skipping any remaining attributes
$this->DB_Fields[] = $Field;
}
// Setting file pointer to the first record
fseek($this->FileHandle, $this->DB_FirstData);
$this->FileOpened = true;
// Open memo file, if exists
if(!empty($MemoFilename) and file_exists($MemoFilename) and preg_match("%^(.+).fpt$%i", $MemoFilename)) {
$this->Memo_Handle = fopen($MemoFilename, "r");
if($this->Memo_Handle) {
$this->Memo_Opened = true;
// Getting block size
fseek($this->Memo_Handle, 6);
$Data = unpack("n", fread($this->Memo_Handle, 2));
$this->Memo_BlockSize = $Data[1];
}
}
}
return $Return;
}
public function GetNextRecord($FieldCaptions = false) {
$Return = NULL;
$Record = array();
if(!$this->FileOpened) {
$Return = false;
} elseif(feof($this->FileHandle)) {
$Return = NULL;
} else {
// File open and not EOF
fseek($this->FileHandle, 1, SEEK_CUR); // Ignoring DELETE flag
foreach($this->DB_Fields as $Field) {
$RawData = fread($this->FileHandle, $Field["Size"]);
// Checking for memo reference
if($Field["Type"] == "M" and $Field["Size"] == 4 and !empty($RawData)) {
// Binary Memo reference
$Memo_BO = unpack("V", $RawData);
if($this->Memo_Opened and $Memo_BO != 0) {
fseek($this->Memo_Handle, $Memo_BO[1] * $this->Memo_BlockSize);
$Type = unpack("N", fread($this->Memo_Handle, 4));
if($Type[1] == "1") {
$Len = unpack("N", fread($this->Memo_Handle, 4));
$Value = trim(fread($this->Memo_Handle, $Len[1]));
} else {
// Pictures will not be shown
$Value = "{BINARY_PICTURE}";
}
} else {
$Value = "{NO_MEMO_FILE_OPEN}";
}
} else {
$Value = trim($RawData);
}
if($FieldCaptions) {
$Record[$Field["Name"]] = $Value;
} else {
$Record[] = $Value;
}
}
$Return = $Record;
}
return $Return;
}
function __destruct() {
// Cleanly close any open files before destruction
$this->Initialize();
}
}
The class can be used like this:
$Test = new Prodigy_DBF("customer.DBF", "customer.FPT");
while(($Record = $Test->GetNextRecord(true)) and !empty($Record)) {
print_r($Record);
}
It might not be an almighty perfect class, but it works for me. Feel free to use this code, but note that the class is VERY tolerant - it doesn't care if fread() and fseek() return true or anything else - so you might want to improve it a bit before using.
Also note that there are many private variables like number of records, recordsize etc. which are not used at the moment.
I replaced this code:
fseek($this->Memo_Handle, (trim($RawData) * $this->Memo_BlockSize)+8);
$Value = trim(fread($this->Memo_Handle, $this->Memo_BlockSize));
with this code:
fseek($this->Memo_Handle, (trim($RawData) * $this->Memo_BlockSize)+4);
$Len = unpack("N", fread($this->Memo_Handle, 4));
$Value = trim(fread($this->Memo_Handle, $Len[1] ));
this helped for me
Although not PHP, VFP is 1-based references and I think PHP is zero-based references so you'll have to decypher and adjust accordingly, but this works and hopefully you'll be able
to post your version of this portion when finished.
FILETOSTR() in VFP will open a file, and read the entire content into
a single memory variable as a character string -- all escape keys, high byte characters, etc, intact. You'll probably need to rely on an FOPEN(), FSEEK(), FCLOSE(), etc.
MemoTest.FPT was my sample memo table/file
fpt1 = FILETOSTR( "MEMOTEST.FPT" )
First, you'll have to detect the MEMO BLOCK SIZE used when the file was created. Typically this would be 64 BYTES, but per the link you had in your post.
The header positions 6-7 identify the size (VFP, positions 7 and 8). The first byte is the high-order
nBlockSize = ASC( SUBSTR( fpt1, 7, 1 )) * 256 + ASC( SUBSTR( fpt1, 8, 1 ))
Now, at your individual records. Wherever in your DBF structure has the memo FIELD (and you can have many per single record structure) there will be 4 bytes. In THE RECORD field, it identifies the "block" in the memo file where the content is stored.
MemoBytes = 4 bytes at your identified field location. These will be stored as ASCII from 0-255. This field is stored with the FIRST byte as low-order and the 4th byte as 256^3 = 16777216. The first "Block" ever used will be starting in position offset of 512 per the memo .fpt file spec that the header takes up positions 0-511.
So, if your first memo field has a content of "8000" where the 8 is the actual 0x08, not number "8" which is 0x38, and the zeros are 0x00.
YourMemoField = "8000" (actually use ascii, but for readability showing hex expected value)
First Byte is ASCII value * 1 ( 256 ^ 0 )
Second Byte is ASCII value * 256 (256 ^ 1)
Third Byte is ASCII value * 65536 (256 ^ 2)
Fourth Byte is ASCII value * 16777216 (256 ^ 3)
nMemoBlock = byte1 + ( byte2 * 256 ) + ( byte3 * 65536 ) + ( byte4 * 16777216 )
Now, you'll need to FSEEK() to the
FSEEK( handle, nMemoBlock * nBlockSize +1 )
for the first byte of the block you are looking for. This will point to the BLOCK header. In this case, per the spec, the first 4 bytes identify the Block SIGNATURE, the second 4 bytes is the length of the content. For these two, the bytes are stored with HIGH-BYTE first.
From your FSEEK(), its REVERSE of the nMemoBlock above with the high-byte. The "Byte1-4" here are from your FSEEK() position
nSignature = ( byte1 * 16777216 ) + ( byte2 * 65536 ) + ( byte3 * 256 ) + byte4
nMemoLength = ( byte5 * 16777216 ) + ( byte6 * 65536 ) + ( byte7 * 256 ) + byte8
Now, FSEEK() to the 9th byte (1st actual character of the data AFTER the 8 bytes of the header you just read for signature and memo length). This is the beginning of your data.
Now, read the rest of the content...
FSEEK() +9 characters to new position
cFinalMemoData = FREAD( handle, nMemoLength )
I know this isn't perfect, nor PHP script, but its enough of pseudo-code on hOW things are stored and hopefully gets you WELL on your way.
Again, PLEASE take into consideration as you are stepping through your debug process to ensure 0 or 1 offset basis. To help simplify and test this, I created a simple .DBF with 2 fields... a character field and a memo field, added a few records and some basic content to confirm all content, positions, etc.
I think it's unlikely there are FoxPro libraries in PHP.
You may have to code it from scratch. For byte-wise reading, meet fopen() fread() and colleagues.
Edit: There seems to be a Visual FoxPro ODBC driver. You may be able to connect to a FoxPro database through PDO and that connector. How the chances of success are, and how much work it would be, I don't know.
The FPT file contains memo data. In the DBF you have columns of type memo, and the information in this column is a pointer to the entry in the FPT file.
If you are querying the data from the table you only have to reference the memo column to get the data. You do not need to parse the data out of the FPT file separately. The OLE DB driver (or the ODBC driver if your files are VFP 6 or earlier) should just give you this information.
There is a tool that will automatically migrate your Visual FoxPro data to MySQL. You might want to check it out to see if you can save some time:
Go to http://leafe.com/dls/vfp
and search for "Stru2MySQL_2" for the tool to migrate the structures of the data and "VFP2MySQL Data Upload program" for tools to help with the migration.
Rick Schummer VFP MVP
You also might want to check the PHP dbase libraries.They work quite well with DBF files.
<?
class Prodigy_DBF {
private $Filename, $DB_Type, $DB_Update, $DB_Records, $DB_FirstData, $DB_RecordLength, $DB_Flags, $DB_CodePageMark, $DB_Fields, $FileHandle, $FileOpened;
private $Memo_Handle, $Memo_Opened, $Memo_BlockSize;
private function Initialize() {
if($this->FileOpened) {
fclose($this->FileHandle);
}
if($this->Memo_Opened) {
fclose($this->Memo_Handle);
}
$this->FileOpened = false;
$this->FileHandle = NULL;
$this->Filename = NULL;
$this->DB_Type = NULL;
$this->DB_Update = NULL;
$this->DB_Records = NULL;
$this->DB_FirstData = NULL;
$this->DB_RecordLength = NULL;
$this->DB_CodePageMark = NULL;
$this->DB_Flags = NULL;
$this->DB_Fields = array();
$this->Memo_Handle = NULL;
$this->Memo_Opened = false;
$this->Memo_BlockSize = NULL;
}
public function __construct($Filename, $MemoFilename = NULL) {
$this->Prodigy_DBF($Filename, $MemoFilename);
}
public function Prodigy_DBF($Filename, $MemoFilename = NULL) {
$this->Initialize();
$this->OpenDatabase($Filename, $MemoFilename);
}
public function OpenDatabase($Filename, $MemoFilename = NULL) {
$Return = false;
$this->Initialize();
$this->FileHandle = fopen($Filename, "r");
if($this->FileHandle) {
// DB Open, reading headers
$this->DB_Type = dechex(ord(fread($this->FileHandle, 1)));
$LUPD = fread($this->FileHandle, 3);
$this->DB_Update = ord($LUPD[0])."/".ord($LUPD[1])."/".ord($LUPD[2]);
$Rec = unpack("V", fread($this->FileHandle, 4));
$this->DB_Records = $Rec[1];
$Pos = fread($this->FileHandle, 2);
$this->DB_FirstData = (ord($Pos[0]) + ord($Pos[1]) * 256);
$Len = fread($this->FileHandle, 2);
$this->DB_RecordLength = (ord($Len[0]) + ord($Len[1]) * 256);
fseek($this->FileHandle, 28); // Ignoring "reserved" bytes, jumping to table flags
$this->DB_Flags = dechex(ord(fread($this->FileHandle, 1)));
$this->DB_CodePageMark = ord(fread($this->FileHandle, 1));
fseek($this->FileHandle, 2, SEEK_CUR); // Ignoring next 2 "reserved" bytes
// Now reading field captions and attributes
while(!feof($this->FileHandle)) {
// Checking for end of header
if(ord(fread($this->FileHandle, 1)) == 13) {
break; // End of header!
} else {
// Go back
fseek($this->FileHandle, -1, SEEK_CUR);
}
$Field["Name"] = trim(fread($this->FileHandle, 11));
$Field["Type"] = fread($this->FileHandle, 1);
fseek($this->FileHandle, 4, SEEK_CUR); // Skipping attribute "displacement"
$Field["Size"] = ord(fread($this->FileHandle, 1));
fseek($this->FileHandle, 15, SEEK_CUR); // Skipping any remaining attributes
$this->DB_Fields[] = $Field;
}
// Setting file pointer to the first record
fseek($this->FileHandle, $this->DB_FirstData);
$this->FileOpened = true;
// Open memo file, if exists
if(!empty($MemoFilename) and file_exists($MemoFilename) and preg_match("%^(.+).fpt$%i", $MemoFilename)) {
$this->Memo_Handle = fopen($MemoFilename, "r");
if($this->Memo_Handle) {
$this->Memo_Opened = true;
// Getting block size
fseek($this->Memo_Handle, 6);
$Data = unpack("n", fread($this->Memo_Handle, 2));
$this->Memo_BlockSize = $Data[1];
}
}
}
return $Return;
}
public function GetNextRecord($FieldCaptions = false) {
$Return = NULL;
$Record = array();
if(!$this->FileOpened) {
$Return = false;
} elseif(feof($this->FileHandle)) {
$Return = NULL;
} else {
// File open and not EOF
fseek($this->FileHandle, 1, SEEK_CUR); // Ignoring DELETE flag
foreach($this->DB_Fields as $Field) {
$RawData = fread($this->FileHandle, $Field["Size"]);
// Checking for memo reference
if($Field["Type"] == "M" and $Field["Size"] == 4 and !empty($RawData)) {
// Binary Memo reference
$Memo_BO = unpack("V", $RawData);
if($this->Memo_Opened and $Memo_BO != 0) {
fseek($this->Memo_Handle, $Memo_BO[1] * $this->Memo_BlockSize);
$Type = unpack("N", fread($this->Memo_Handle, 4));
if($Type[1] == "1") {
$Len = unpack("N", fread($this->Memo_Handle, 4));
$Value = trim(fread($this->Memo_Handle, $Len[1]));
} else {
// Pictures will not be shown
$Value = "{BINARY_PICTURE}";
}
} else {
$Value = "{NO_MEMO_FILE_OPEN}";
}
} else {
if($Field["Type"] == "M"){
if(trim($RawData) > 0) {
fseek($this->Memo_Handle, (trim($RawData) * $this->Memo_BlockSize)+8);
$Value = trim(fread($this->Memo_Handle, $this->Memo_BlockSize));
}
}else{
$Value = trim($RawData);
}
}
if($FieldCaptions) {
$Record[$Field["Name"]] = $Value;
} else {
$Record[] = $Value;
}
}
$Return = $Record;
}
return $Return;
}
function __destruct() {
// Cleanly close any open files before destruction
$this->Initialize();
}
}
?>