Custom formatting of PHP exception with collected data - php

I have regular use cases of PHP \Exception sub classes where I want to collect up data and then bundle it into a final error message. For example:
checking some data has contiguous days
$missing = new MissingAdjustmentDataException('');
$testDate = $period->getPreviousPeriod()->getEnd();
$z = 0;
while ($testDate <= $period->getEnd() && $z < 500){
if (!in_array($testDate, array_column($activationRedemptionAdjustmentDays, 'effective') )){
$missing->addMissingRedemptionAdjustment($testDate);
}
if (!in_array($testDate, array_column($platformAdjustmentDays, 'effective') )){
$missing->addMissingPlatformAdjustment($testDate);
}
$testDate->add(new \DateInterval('P1D'));
$z++;
}
Then in my exception, I'm collecting the data in arrays:
class MissingAdjustmentDataException extends \Exception
{
private $missingRedemptionAdjustment = [];
private $missingPlatformAdjustment = [];
public function updateMessage()
{
$message = 'Missing Adjustment data: ';
if ($this->missingRedemptionAdjustment){
$ra = [];
foreach ($this->missingRedemptionAdjustment as $item){
$ra[] = $item->format('Y-m-d');
}
$message .= 'RedemptionAdjustment: '.implode(',',$ra);
}
if ($this->missingPlatformAdjustment){
$pl = [];
foreach ($this->missingRedemptionAdjustment as $item){
$pl[] = $item->format('Y-m-d');
}
$message .= 'PlatformAdjustment: '.implode(',',$pl);
}
$this->message = $message;
}
public function inError() : bool
{
if ($this->missingRedemptionAdjustment || $this->missingPlatformAdjustment){
return true;
}else{
return false;
}
}
public function addMissingRedemptionAdjustment(\DateTime $dateTime){
$this->missingRedemptionAdjustment[] = clone $dateTime;
$this->updateMessage();
}
public function addMissingPlatformAdjustment(\DateTime $dateTime){
$this->missingPlatformAdjustment[] = clone $dateTime;
$this->updateMessage();
}
}
My main problem is that I cannot find a way to do the formatting of the message in a "lazy" way when $missing->getMessage() is called. It seems to have update $this->message inside the Exception every time I add a data point to the exception.
Is there a better way to do this?

The issue is that you are mixing two different things: the object that keeps track of the errors, and the exception.
You should properly seperate them. For example:
class MissingDataCollector
{
private $missingRedemptionAdjustment = [];
private $missingPlatformAdjustment = [];
public function addMissingRedemptionAdjustment(\DateTime $dateTime)
{
$this->missingRedemptionAdjustment[] = clone $dateTime;
}
public function addMissingPlatformAdjustment(\DateTime $dateTime)
{
$this->missingPlatformAdjustment[] = clone $dateTime;
}
public function check()
{
if ($this->missingRedemptionAdjustment || $this->missingPlatformAdjustment)
throw new \Exception($this->getMessage());
}
private function getMessage()
{
$message = 'Missing Adjustment data:';
if ($this->missingRedemptionAdjustment){
$ra = [];
foreach ($this->missingRedemptionAdjustment as $item){
$ra[] = $item->format('Y-m-d');
}
$message .= ' RedemptionAdjustment: '.implode(',', $ra);
}
if ($this->missingPlatformAdjustment){
$pl = [];
foreach ($this->missingRedemptionAdjustment as $item){
$pl[] = $item->format('Y-m-d');
}
$message .= ' PlatformAdjustment: '.implode(',', $pl);
}
return $message;
}
}
And the way to use it:
$missing = new MissingDataCollector();
// Some processing that may call addMissingRedemptionAdjustment() or addMissingPlatformAdjustment()
...
// Throw an exception in case of missing data
$missing->check();

You can execute updateMessage() while catching the exception
catch(MissingAdjustmentDataException $e) {
$e->updateMessage();
echo $e->getMessage();
}
You will find some advices and hacks in How to change exception message of Exception object?

Related

I think I found a bug with setValue from ReflectionProperty

I'm working on a function to recursively remove arrays and objects recursively. The problem is that certain recursions may be inside private properties of objects.
below is what I tried as well as the entries I tried to use.
this is my entrie
class TestOBJ{
private $fooClosure = null;
public $bar = 5;
private $myPrivateRecursion = null;
private $aimArrayAndContainsRecursion = [];
public function __construct()
{
$this->fooClosure = function(){
echo 'pretty closure';
};
}
public function setMyPrivateRecursion(&$obj){
$this->myPrivateRecursion = &$obj;
}
public function setObjInsideArray(&$obj){
$this->aimArrayAndContainsRecursion[] = &$obj;
}
}
$std = new stdClass();
$std->std = 'any str';
$std->obj = new stdClass();
$std->obj->other = &$std;
$obj = new TestOBJ();
$obj->bar = new TestOBJ();
$obj->bar->bar = 'hey brow, please works';
$obj->bar->setMyPrivateRecursion($std);
my entrie is var $obj
and this is my function / solution
function makeRecursionStack($vector, &$stack = [], $from = null)
{
if ($vector) {
if (is_object($vector) && !in_array($vector, $stack, true) && !is_callable($vector)) {
$stack[] = &$vector;
if (get_class($vector) === 'stdClass') {
foreach ($vector as $key => $value) {
if (in_array($vector->{$key}, $stack, true)) {
$vector->{$key} = null;
} else {
$vector->{$key} = $this->makeRecursionStack($vector->{$key}, $stack, $key);
}
}
return $vector;
} else {
$object = new \ReflectionObject($vector);
$reflection = new \ReflectionClass($vector);
$properties = $reflection->getProperties();
if ($properties) {
foreach ($properties as $property) {
$property = $object->getProperty($property->getName());
$property->setAccessible(true);
if (!is_callable($property->getValue($vector))) {
$private = false;
if ($property->isPrivate()) {
$property->setAccessible(true);
$private = true;
}
if (in_array($property->getValue($vector), $stack, true)) {
$property->setValue($vector, null);
} else {
//if($property->getName() === 'myPrivateRecursion' && $from === 'bar'){
//$get = $property->getValue($vector);
//$set = $this->makeRecursionStack($get, $stack, $property->getName());
//$property->setValue($vector, $set);
//pre_clear_buffer_die($property->getValue($vector));
//}
$property->setValue($vector, $this->makeRecursionStack($property->getValue($vector), $stack, $property->getName()));
}
if ($private) {
$property->setAccessible(false);
}
}
}
}
return $vector;
}
} else if (is_array($vector)) {
$nvector = [];
foreach ($vector as $key => $value) {
$nvector[$key] = $this->makeRecursionStack($value, $stack, $key);
}
return $nvector;
} else {
if (is_object($vector) && !is_callable($vector)) {
return null;
}
}
}
return $vector;
}
The place where I have comments is where I noticed the problem. if the If is not commented there $get would receive a stdClass that has recursion and this works perfectly and $set would receive the stdClass without recursion. In that order.
$get =
$set =
After this lines
$property->setValue($vector, $set);
pre_clear_buffer_die($property->getValue($vector));
i obtain this
I try to put other value like an bool or null inside property and after set the $set but it's not works.
P.S: pre_clear_buffer_die kill php buffer, init other buffer and show var inside a <pre> after exit from script. Is an debugger function.

php - contao - saving my model leaves me an empty model

I am programming a Module for Contao in php.
I am using the function "Model::save()", which saves my data to the database.
But when I am trying to use the model after saving, it's just empty. I have no idea how this can happen.
The Code Snippet:
$report->tstamp = time();
$report->machine_id = $machine_data['type_of_machine'];
var_dump($report);
echo "<br/>";
$report->save();
var_dump($report);
echo "<br/>";
So in the var_dump before I save, everything is fine, but the second one doesn't show any data!
Does anybody got some ideas?
Edit2:
OK, here the complete code of the Module:
<?php
use Contao\Date;
use Contao\FilesModel;
use Contao\Input;
use Contao\Module;
use Contao\PageModel;
use Contao\RequestToken;
use Contao\Validator;
class ModuleReportData extends Module
{
protected $strTemplate = 'mod__reportdata';
public function generate()
{
if (TL_MODE == 'BE')
{
/** #var \BackendTemplate|object $objTemplate */
$objTemplate = new \BackendTemplate('be_wildcard');
$objTemplate->wildcard = '### ReportData ###';
$objTemplate->href = 'contao/main.php?do=themes&table=tl_module&act=edit&id=' . $this->id;
return $objTemplate->parse();
}
return parent::generate();
}
public function compile()
{
$report_id = Input::get('r');
if($report_id){
$report = ReportModel::findByPk($report_id);
$project = ProjectModel::findBy('report_id', $report_id);
}else{
$report = new ReportModel();
$project = new ProjectModel();
}
$machine = new MachineModel();
$machines = [];
$next_step = false;
//get data for selectbox machines
$result = $this->Database->prepare("SELECT * FROM tl_sa_machines")->execute();
while($result->next())
{
$id = $result->id;
$machines[$id] = $result->type;
}
//Check if form was submitted
if(Input::post('submit_data')){
$report_data = Input::post('report_data');
$project_data = Input::post('project_data');
$machine_data = Input::post('machine_data');
$errors = [];
$next_step = true;
foreach($report_data as $key => $data)
{
if(empty($data)) continue;
switch ($key) {
case 'document_date':
if (preg_match("/^[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1])$/", $data)) //###andere Formate hinzufügen
{
break;
}
else {
$next_step = false;
$errors[$key] ="Error";
break;
}
case 'customer':
if(Validator::isAlphanumeric($data)) break;
else {
$next_step = false;
$errors[$key] ="Error";
break;
}
case 'city':
if(Validator::isAlphanumeric($data)) break;
else {
$next_step = false;
$errors[$key] ="Error";
break;
}
case 'country':
if(Validator::isAlphanumeric($data)) break;
else {
$next_step = false;
$errors[$key] ="Error";
break;
}
case 'document_version':
if(Validator::isNumeric($data)) break;
else {
$next_step = false;
$errors[$key] ="Error";
break;
}
case 'author':
if(Validator::isAlphanumeric($data)) break;
else {
$next_step = false;
$errors[$key] ="Error";
break;
}
case 'max_speed':
if(Validator::isNumeric($data)) break;
else {
$next_step = false;
$errors[$key] ="Error";
break;
}
}
}
$report->setRow($report_data);
foreach($project_data as $key => $data)
{
if(empty($data)) continue;
if(Validator::isAlphanumeric($data)) continue;
else {
$next_step = false;
$errors[$key] = "Error";
}
}
$project->setRow($project_data);
if($next_step)
{
$project->date_of_evaluation = strtotime($project->date_of_evaluation);
$report->document_date = strtotime($report->document_date);
//save and set report_data
$report->tstamp = time();
$report->machine_id = $machine_data['type_of_machine'];
var_dump($report);
echo "<br/>";
$report->save();
var_dump($report);
echo "<br/>";
$report = ReportModel::findByPK($report_id);
var_dump($report);
//save and set project_data
$project->report_id = $report->id;
$project->tstamp = time();
$project->save();
//session for transfering report_id to the next page
/* var_dump($report->id);
var_dump($report_id);
var_dump($project->report_id);
if($report_id) {
$_SESSION['report_id'] = $report_id;
}
else
{//var_dump($report_id);
//var_dump($report->id);
$report_id = $report->id;
$_SESSION['report_id'] = $report_id;
}
$jumpTo = PageModel::findByPk($this->jumpTo);
$url = $this->generateFrontendUrl($jumpTo->row());
$this->redirect($url);*/
}
}
$this->Template->report = $report;
$this->Template->project = $project;
$this->Template->machine = $machine;
$this->Template->machines = $machines;
$this->Template->errors = $errors;
$this->Template->request_token = RequestToken::get();
}
}
I have a form, to save new data, or to edit existing data. There are two different tables in the database I am trying to fill with data. FOr second one I need the new ID of the new row generated in this code. But it doesn't work because the model is empty after saving.
Edit3:
ProjectModel is just that simple:
use Contao\Model;
class ProjectModel extends Model{
protected static $strTable = "tl_sa_projects";
}
I just found out, it only happens when I use the save method on $report. It's working fine with $project!
Update:
It looks like I get an error when the refresh() method tries to select the new inserted databaserow with:
public function refresh()
{
$intPk = $this->{static::$strPk};
// Track primary key changes
if (isset($this->arrModified[static::$strPk]))
{
$intPk = $this->arrModified[static::$strPk];
}
// Reload the database record
$res = \Database::getInstance()->prepare("SELECT * FROM " . static::$strTable . " WHERE " . static::$strPk . "=?")
->execute($intPk);
var_dump($res);
$this->setRow($res->row());
}
Update 2:
Ok the problem is, that the "arrModified" contains an empty string as ID. Does anybody know where this array gets its elements?
Not the answer to your original question, but you should use
ProjectModel::findOneBy('report_id', $report_id);
instead of
ProjectModel::findBy('report_id', $report_id);
since you want to find only one specific project. findBy returns a Contao\Model\Collection (i.e. potentially multiple results) whereas findOneBy returns a Contao\Model.
Update:
Furthermore, your usage of setData and mergeRow is probably not intended this way. You should instead use
foreach ($project_data as $key => $val)
{
$project->$key = $val;
}
for instance.

avoid duplicates in flash based messaging

I am using a php session-based flash messenger available here. The issue is sometimes I get multiple messages of the same type when I generate errors, display messages, so on and so fourth. This is mostly due to some AJAX issues. Assuming that I wanted to only apply a fix in the display code here:
public function display($type = 'all', $print = true)
{
$messages = '';
$data = '';
if (!isset($_SESSION['flash_messages'])) {
return false;
}
// print a certain type of message?
if (in_array($type, $this->msgTypes)) {
foreach ($_SESSION['flash_messages'][$type] as $msg) {
$messages .= $this->msgBefore . $msg . $this->msgAfter;
}
$data .= sprintf($this->msgWrapper, $this->msgClass, $this->msgClassPrepend.'-'.$type, str_replace('messages', 'autoclose',$this->msgClassPrepend.'-'.$type), $messages);
// clear the viewed messages
$this->clear($type);
// print ALL queued messages
} elseif ($type == 'all') {
$counter = 1;
foreach ($_SESSION['flash_messages'] as $type => $msgArray) {
$count = $counter++;
$messages = '';
foreach ($msgArray as $msg) {
$messages .= $this->msgBefore . $msg . $this->msgAfter;
}
$data .= sprintf($this->msgWrapper, $this->msgClass, $this->msgClassPrepend.'-'.$type, str_replace('messages', 'autoclose', $this->msgClassPrepend.'-'.$type), $messages);
}
// clear ALL of the messages
$this->clear();
// invalid message type?
} else {
return false;
}
// print everything to the screen or return the data
if ($print) {
echo $data;
} else {
return $data;
}
}
How would I make it so that duplicate messages are detected on a 1 for 1 basis. So if the message is "Hello" and "Hello" and "Hello." I can remove one of the first two, and keep the later as it is a different message so to speak. All the workarounds I can think of would be overly complex, and I was wondering if anyone could think of a simple solution.
Additional info: display is encased in class Messages and a new message is created with
$msg = new Messages();
$msg->add('e', 'Some error here.');
You could simply run the message array through array_unique() before the $messages string is built. For example, these two additions to the display method should do the trick...
if (in_array($type, $this->msgTypes)) {
$filtered = array_unique($_SESSION['flash_messages'][$type]);
foreach ($filtered as $msg) {
and...
foreach ($_SESSION['flash_messages'] as $type => $msgArray) {
$count = $counter++;
$messages = '';
$filtered = array_unique($msgArray);
foreach ($filtered as $msg) {
Alternatively, you could override the add method with a unique check. For example
public function add($type, $message, $redirect_to = null, $ignoreDuplicates = true) {
// snip...
// wrap the array push in this check
if (!($ignoreDuplicates && in_array($message, $_SESSION['flash_messages'][$type]))) {
$_SESSION['flash_messages'][$type][] = $message; // this is the existing code
}
// snip...
}

Cassandra with PHP - on call of cassandra-test.php I get "Call to undefined method CassandraClient::batch_insert()"

Im trying to make Cassandra run with PHP on Windows 7 at the moment.
I installed cassandra and thrift...
When I call the cassandra-test.php, I get the following error:
( ! ) Fatal error: Call to undefined method
CassandraClient::batch_insert() in
C:\xampp\htdocs\YiiPlayground\cassandra-test.php on line 75
Call Stack
# Time Memory Function Location
1 0.0014 337552 {main}( ) ..\cassandra-test.php:0
2 0.0138 776232 CassandraDB->InsertRecord(
) ..\cassandra-test.php:304
The cassandra-test.php looks as follows:
<?php
// CassandraDB version 0.1
// Software Projects Inc
// http://www.softwareprojects.com
//
// Includes
$GLOBALS['THRIFT_ROOT'] = 'C:/xampp/htdocs/Yii/kallaspriit-Cassandra-PHP-Client-Library/thrift';
//$GLOBALS['THRIFT_ROOT'] = realpath('E:/00-REGIESTART/Programme/Cassandra/thrift');
require_once $GLOBALS['THRIFT_ROOT'].'/packages/cassandra/Cassandra.php';
require_once $GLOBALS['THRIFT_ROOT'].'/packages/cassandra/cassandra_types.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TFramedTransport.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php';
class CassandraDB
{
// Internal variables
protected $socket;
protected $client;
protected $keyspace;
protected $transport;
protected $protocol;
protected $err_str = "";
protected $display_errors = 0;
protected $consistency = 1;
protected $parse_columns = 1;
// Functions
// Constructor - Connect to Cassandra via Thrift
function CassandraDB ($keyspace, $host = "127.0.0.1", $port = 9160)
{
// Initialize
$this->err_str = '';
try
{
// Store passed 'keyspace' in object
$this->keyspace = $keyspace;
// Make a connection to the Thrift interface to Cassandra
$this->socket = new TSocket($host, $port);
$this->transport = new TFramedTransport($this->socket, 1024, 1024);
$this->protocol = new TBinaryProtocolAccelerated($this->transport);
$this->client = new CassandraClient($this->protocol);
$this->transport->open();
}
catch (TException $tx)
{
// Error occured
$this->err_str = $tx->why;
$this->Debug($tx->why." ".$tx->getMessage());
}
}
// Insert Column into ColumnFamily
// (Equivalent to RDBMS Insert record to a table)
function InsertRecord ($table /* ColumnFamily */, $key /* ColumnFamily Key */, $record /* Columns */)
{
// Initialize
$this->err_str = '';
try
{
// Timestamp for update
$timestamp = time();
// Build batch mutation
$cfmap = array();
$cfmap[$table] = $this->array_to_supercolumns_or_columns($record, $timestamp);
// Insert
$this->client->batch_insert($this->keyspace, $key, $cfmap, $this->consistency);
// If we're up to here, all is well
$result = 1;
}
catch (TException $tx)
{
// Error occured
$result = 0;
$this->err_str = $tx->why;
$this->Debug($tx->why." ".$tx->getMessage());
}
// Return result
return $result;
}
// Insert SuperColumn into SuperColumnFamily
// (Equivalent to RDMBS Insert record to a "nested table")
function InsertRecordArray ($table /* SuperColumnFamily */, $key_parent /* Super CF */,
$record /* Columns */)
{
// Initialize
$err_str = '';
try
{
// Timestamp for update
$timestamp = time();
// Build batch mutation
$cfmap = array();
$cfmap[$table] = $this->array_to_supercolumns_or_columns($record, $timestamp);
// Insert
$this->client->batch_insert($this->keyspace, $key_parent, $cfmap, $this->consistency);
// If we're up to here, all is well
$result = 1;
}
catch (TException $tx)
{
// Error occured
$result = 0;
$this->err_str = $tx->why;
$this->Debug($tx->why." ".$tx->getMessage());
}
// Return result
return $result;
}
// Get record by key
function GetRecordByKey ($table /* ColumnFamily or SuperColumnFamily */, $key, $start_from="", $end_at="")
{
// Initialize
$err_str = '';
try
{
return $this->get($table, $key, NULL, $start_from, $end_at);
}
catch (TException $tx)
{
// Error occured
$this->err_str = $tx->why;
$this->Debug($tx->why." ".$tx->getMessage());
return array();
}
}
// Print debug message
function Debug ($str)
{
// If verbose is off, we're done
if (!$this->display_errors) return;
// Print
echo date("Y-m-d h:i:s")." CassandraDB ERROR: $str\r\n";
}
// Turn verbose debug on/off (Default is off)
function SetDisplayErrors($flag)
{
$this->display_errors = $flag;
}
// Set Consistency level (Default is 1)
function SetConsistency ($consistency)
{
$this->consistency = $consistency;
}
// Build cf array
function array_to_supercolumns_or_columns($array, $timestamp=null)
{
if(empty($timestamp)) $timestamp = time();
$ret = null;
foreach($array as $name => $value) {
$c_or_sc = new cassandra_ColumnOrSuperColumn();
if(is_array($value)) {
$c_or_sc->super_column = new cassandra_SuperColumn();
$c_or_sc->super_column->name = $this->unparse_column_name($name, true);
$c_or_sc->super_column->columns = $this->array_to_columns($value, $timestamp);
$c_or_sc->super_column->timestamp = $timestamp;
}
else
{
$c_or_sc = new cassandra_ColumnOrSuperColumn();
$c_or_sc->column = new cassandra_Column();
$c_or_sc->column->name = $this->unparse_column_name($name, true);
$c_or_sc->column->value = $value;
$c_or_sc->column->timestamp = $timestamp;
}
$ret[] = $c_or_sc;
}
return $ret;
}
// Parse column names for Cassandra
function parse_column_name($column_name, $is_column=true)
{
if(!$column_name) return NULL;
return $column_name;
}
// Unparse column names for Cassandra
function unparse_column_name($column_name, $is_column=true)
{
if(!$column_name) return NULL;
return $column_name;
}
// Convert supercolumns or columns into an array
function supercolumns_or_columns_to_array($array)
{
$ret = null;
for ($i=0; $i<count($array); $i++)
foreach ($array[$i] as $object)
{
if ($object)
{
// If supercolumn
if (isset($object->columns))
{
$record = array();
for ($j=0; $j<count($object->columns); $j++)
{
$column = $object->columns[$j];
$record[$column->name] = $column->value;
}
$ret[$object->name] = $record;
}
// (Otherwise - not supercolumn)
else
{
$ret[$object->name] = $object->value;
}
}
}
return $ret;
}
// Get record from Cassandra
function get($table, $key, $super_column=NULL, $slice_start="", $slice_finish="")
{
try
{
$column_parent = new cassandra_ColumnParent();
$column_parent->column_family = $table;
$column_parent->super_column = $this->unparse_column_name($super_column, false);
$slice_range = new cassandra_SliceRange();
$slice_range->start = $slice_start;
$slice_range->finish = $slice_finish;
$predicate = new cassandra_SlicePredicate();
$predicate->slice_range = $slice_range;
$resp = $this->client->get_slice($this->keyspace, $key, $column_parent, $predicate, $this->consistency);
return $this->supercolumns_or_columns_to_array($resp);
}
catch (TException $tx)
{
$this->Debug($tx->why." ".$tx->getMessage());
return array();
}
}
// Convert array to columns
function array_to_columns($array, $timestamp=null) {
if(empty($timestamp)) $timestamp = time();
$ret = null;
foreach($array as $name => $value) {
$column = new cassandra_Column();
$column->name = $this->unparse_column_name($name, false);
$column->value = $value;
$column->timestamp = $timestamp;
$ret[] = $column;
}
return $ret;
}
// Get error string
function ErrorStr()
{
return $this->err_str;
}
}
// Initialize Cassandra
$cassandra = new CassandraDB("SPI");
// Debug on
$cassandra->SetDisplayErrors(true);
// Insert record ("Columns" in Cassandra)
$record = array();
$record["name"] = "Mike Peters";
$record["email"] = "mike at softwareprojects.com";
if ($cassandra->InsertRecord('mytable', "Mike Peters", $record)) {
echo "Record (Columns) inserted successfully.\r\n";
}
// Print record
$record = $cassandra->GetRecordByKey('mytable', "Mike Peters");
print_r($record);
?>
Any ideas on this, how to fix this?
Thanks a lot!
You really don't want to do Thrift by hand if you can avoid it. Take a look at phpcassa library:
https://github.com/thobbs/phpcassa
Oh, and in the above, looks like you want 'batch_mutate' not 'batch_insert' on ln. 75. That method changed names in versions of cassandra > 0.6.x

parsing user-typed Full Text Search queries into WHERE clause of MySQL using PHP

I want to convert user typed FTS queries in to MySQL's WHERE clause. So the functionality will be something like Gmail's search. So users will be able to type:
from:me AND (to:john OR to:jenny) dinner
Although I don't think it is important, the table structure will be something like:
Message
- id
- from
- to
- title
- description
- time_created
MessageComment
- id
- message_id
- comment
- time_created
Since this is a common problem, I thought there may be already existing solution. Is there any?
P.S. There is a similar question like this here, but it is for SQL Server.
The following code consists of the classes Tokenizer, Token and QueryBuilder.
It is probably not the most elegant solution ever, but it actually does what you were asking:
<?
// QueryBuilder Grammar:
// =====================
// SearchRule := SimpleSearchRule { KeyWord }
// SimpleSearchRule := Expression | SimpleSearchRule { 'OR' Expression }
// Expression := SimpleExpression | Expression { 'AND' SimpleExpression }
// SimpleExpression := '(' SimpleSearchRule ')' | FieldExpression
$input = 'from:me AND (to:john OR to:jenny) dinner party';
$fieldMapping = array(
'id' => 'id',
'from' => 'from',
'to' => 'to',
'title' => 'title',
'description' => 'description',
'time_created' => 'time_created'
);
$fullTextFields = array('title','description');
$qb = new QueryBuilder($fieldMapping, $fullTextFields);
try {
echo $qb->parseSearchRule($input);
} catch(Exception $error) {
echo 'Error occurred while parsing search query: <br/>'.$error->getMessage();
}
class Token {
const KEYWORD = 'KEYWORD',
OPEN_PAR='OPEN_PAR',
CLOSE_PAR='CLOSE_PAR',
FIELD='FIELD',
AND_OP='AND_OP',
OR_OP='OR_OP';
public $type;
public $chars;
public $position;
function __construct($type,$chars,$position) {
$this->type = $type;
$this->chars = $chars;
$this->position = $position;
}
function __toString() {
return 'Token[ type='.$this->type.', chars='.$this->chars.', position='.$this->position.' ]';
}
}
class Tokenizer {
private $tokens = array();
private $input;
private $currentPosition;
function __construct($input) {
$this->input = trim($input);
$this->currentPosition = 0;
}
/**
* #return Token
*/
function getToken() {
if(count($this->tokens)==0) {
$token = $this->nextToken();
if($token==null) {
return null;
}
array_push($this->tokens, $token);
}
return $this->tokens[0];
}
function consumeToken() {
$token = $this->getToken();
if($token==null) {
return null;
}
array_shift($this->tokens);
return $token;
}
protected function nextToken() {
$reservedCharacters = '\:\s\(\)';
$fieldExpr = '/^([^'.$reservedCharacters.']+)\:([^'.$reservedCharacters.']+)/';
$keyWord = '/^([^'.$reservedCharacters.']+)/';
$andOperator = '/^AND\s/';
$orOperator = '/^OR\s/';
// Remove whitespaces ..
$whiteSpaces = '/^\s+/';
$remaining = substr($this->input,$this->currentPosition);
if(preg_match($whiteSpaces, $remaining, $matches)) {
$this->currentPosition += strlen($matches[0]);
$remaining = substr($this->input,$this->currentPosition);
}
if($remaining=='') {
return null;
}
switch(substr($remaining,0,1)) {
case '(':
return new Token(Token::OPEN_PAR,'(',$this->currentPosition++);
case ')':
return new Token(Token::CLOSE_PAR,')',$this->currentPosition++);
}
if(preg_match($fieldExpr, $remaining, $matches)) {
$token = new Token(Token::FIELD, $matches[0], $this->currentPosition);
$this->currentPosition += strlen($matches[0]);
} else if(preg_match($andOperator, $remaining, $matches)) {
$token = new Token(Token::AND_OP, 'AND', $this->currentPosition);
$this->currentPosition += 3;
} else if(preg_match($orOperator, $remaining, $matches)) {
$token = new Token(Token::OR_OP, 'OR', $this->currentPosition);
$this->currentPosition += 2;
} else if(preg_match($keyWord, $remaining, $matches)) {
$token = new Token(Token::KEYWORD, $matches[0], $this->currentPosition);
$this->currentPosition += strlen($matches[0]);
} else throw new Exception('Unable to tokenize: '.$remaining);
return $token;
}
}
class QueryBuilder {
private $fieldMapping;
private $fulltextFields;
function __construct($fieldMapping, $fulltextFields) {
$this->fieldMapping = $fieldMapping;
$this->fulltextFields = $fulltextFields;
}
function parseSearchRule($input) {
$t = new Tokenizer($input);
$token = $t->getToken();
if($token==null) {
return '';
}
$token = $t->getToken();
if($token->type!=Token::KEYWORD) {
$searchRule = $this->parseSimpleSearchRule($t);
} else {
$searchRule = '';
}
$keywords = '';
while($token = $t->consumeToken()) {
if($token->type!=Token::KEYWORD) {
throw new Exception('Only keywords allowed at end of search rule.');
}
if($keywords!='') {
$keywords .= ' ';
}
$keywords .= $token->chars;
}
if($keywords!='') {
$matchClause = 'MATCH (`'.(implode('`,`',$this->fulltextFields)).'`) AGAINST (';
$keywords = $matchClause.'\''.mysql_real_escape_string($keywords).'\' IN BOOLEAN MODE)';
if($searchRule=='') {
$searchRule = $keywords;
} else {
$searchRule = '('.$searchRule.') AND ('.$keywords.')';
}
}
return $searchRule;
}
protected function parseSimpleSearchRule(Tokenizer $t) {
$expressions = array();
do {
$repeat = false;
$expressions[] = $this->parseExpression($t);
$token = $t->getToken();
if($token->type==Token::OR_OP) {
$t->consumeToken();
$repeat = true;
}
} while($repeat);
return implode(' OR ', $expressions);
}
protected function parseExpression(Tokenizer $t) {
$expressions = array();
do {
$repeat = false;
$expressions[] = $this->parseSimpleExpression($t);
$token = $t->getToken();
if($token->type==Token::AND_OP) {
$t->consumeToken();
$repeat = true;
}
} while($repeat);
return implode(' AND ', $expressions);
}
protected function parseSimpleExpression(Tokenizer $t) {
$token = $t->consumeToken();
if($token->type==Token::OPEN_PAR) {
$spr = $this->parseSimpleSearchRule($t);
$token = $t->consumeToken();
if($token==null || $token->type!=Token::CLOSE_PAR) {
throw new Exception('Expected closing parenthesis, found: '.$token->chars);
}
return '('.$spr.')';
} else if($token->type==Token::FIELD) {
$fieldVal = explode(':', $token->chars,2);
if(isset($this->fieldMapping[$fieldVal[0]])) {
return '`'.$this->fieldMapping[$fieldVal[0]].'` = \''.mysql_real_escape_string($fieldVal[1]).'\'';
}
throw new Exception('Unknown field selected: '.$token->chars);
} else {
throw new Exception('Expected opening parenthesis or field-expression, found: '.$token->chars);
}
}
}
?>
A more proper solution would first build a parse tree, and then transform it into a query, after some further analysis.
Your question has two parts
how do I parse a query
how do I construct a full text search from the query I parsed
The first one is quite a difficult subject. a quick search found nothing that equates to what you want. you may be on your own with that one
Don't bother with question 2 until you get question 1 right.
Rather than create a parser that can deal with the query syntax you propose e.g. from:me AND (to:john OR to:jenny) dinner perhaps a simple form may be the answer. provide a list of choices for the user to search on.
In that way you can get the service up and running and in a future revision attack the harder question of how to create a parser to do what you want.
When doing part 2 be very careful to protect against sql injection attacks. for example do not take the table names directly from the query, instead use a lookup.
Not the answer you wanted, but I don't know if you'll find an out of the box answer. defining your question better is the clue. and google is your friend.
DC
You might want to look at...
http://code.google.com/p/xerxes-portal/source/browse/trunk/lib/Xerxes/QueryParser.php?r=1205
Also
http://www.cmsmadesimple.org/api/class_zend___search___lucene___search___query_parser.html
The above link to 2 very different parser implementations (the second link broke stackoverflow so I codeified it)
DC

Categories