Optimizing Trie implementation - php

For no reason other than fun I implemented a Trie today. At the moment it supports add() and search(), remove() should also be implemented but I think that's fairly straight forward.
It is fully functional, but filling the Trie with data takes a little too much for my taste. I'm using this list as datasource: http://www.isc.ro/lists/twl06.zip (found somewhere else on SO). It takes ~11s to load. My initial implementation took ~15s so I already gave it a nice performance boost, but I'm still not satisfied :)
My question is: what else could give me a (substantial) performance boost? I'm not bound by this design, a complete overhaul is acceptable.
class Trie
{
private $trie;
public function __construct(TrieNode $trie = null)
{
if($trie !== null) $this->trie = $trie;
else $this->trie = new TrieNode();
$this->counter = 0;
}
public function add($value, $val = null)
{
$str = '';
$trie_ref = $this->trie;
foreach(str_split($value) as $char)
{
$str .= $char;
$trie_ref = $trie_ref->addNode($str);
}
$trie_ref->value = $val;
return true;
}
public function search($value, $only_words = false)
{
if($value === '') return $this->trie;
$trie_ref = $this->trie;
$str = '';
foreach(str_split($value) as $char)
{
$str .= $char;
if($trie_ref = $trie_ref->getNode($str))
{
if($str === $value) return ($only_words ? $this->extractWords($trie_ref) : new self($trie_ref));
continue;
}
return false;
}
return false;
}
public function extractWords(TrieNode $trie)
{
$res = array();
foreach($trie->getChildren() as $child)
{
if($child->value !== null) $res[] = $child->value;
if($child->hasChildren()) $res = array_merge($res, $this->extractWords($child));
}
return $res;
}
}
class TrieNode
{
public $value;
protected $children = array();
public function addNode($index)
{
if(isset($this->children[$index])) return $this->children[$index];
return $this->children[$index] = new self();
}
public function getNode($index)
{
return (isset($this->children[$index]) ? $this->children[$index] : false);
}
public function getChildren()
{
return $this->children;
}
public function hasChildren()
{
return count($this->children)>0;
}
}

Don't know php but,
in the following methods:
public function add($value, $val = null)
{
$str = '';
$trie_ref = $this->trie;
foreach(str_split($value) as $char)
{
$str .= $char;
$trie_ref = $trie_ref->addNode($str);
}
$trie_ref->value = $val;
return true;
}
public function search($value, $only_words = false)
{
if($value === '') return $this->trie;
$trie_ref = $this->trie;
$str = '';
foreach(str_split($value) as $char)
{
$str .= $char;
if($trie_ref = $trie_ref->getNode($str))
{
if($str === $value) return ($only_words ? $this->extractWords($trie_ref) : new self($trie_ref));
continue;
}
return false;
}
return false;
}
Why do you even need the $str .= $char (which I suppose is append)? This itself changes your O(n) time addition/searching to Omega(n^2) (n is length of $value) instead of O(n).
In a trie, you usually walk the trie while walking the string i.e you find the next node based on the current character, rather than the current prefix.

I suppose this implementation is for a Key|value type of insertion and lookup? Here is one that handles [English] words.
class Trie {
static function insert_word(Node $root, $text)
{
$v = $root;
foreach(str_split($text) as $char) {
$next = $v->children[$char];
if ($next === null)
{
$v->children[$char] = $next = new Node();
}
$v = $next;
}
$v->leaf = true;
}
static function get_words_sorted(Node $node, $text)
{
$res = array();
for($ch = 0; $ch < 128; $ch++) {
$child = $node->children[chr($ch)];
if ($child !== null)
{
$res = array_merge($res, Trie::get_words_sorted($child, $text . chr($ch)));
}
}
if ($node->leaf === true)
{
$res[] = $text;
}
return $res;
}
static function search(Node $root, $text)
{
$v = $root;
while($v !== null)
{
foreach(str_split($text) as $char) {
$next = $v->children[$char];
if ($next === null)
{
return false;
}
else
{
$v = $next;
}
}
if($v->leaf === true)
{
return true;
}
else
{
return false;
}
}
return false;
}
}
class Node {
public $children;
public $leaf;
function __construct()
{
$children = Array();
}
}
Example usage
$root = new Node();
$words = Array("an", "ant", "all", "allot", "alloy", "aloe", "are", "ate", "be");
for ($i = 0; $i < sizeof($words); $i++)
{
Trie::insert_word($root, $words[$i]);
}
$search_words = array("alloy", "ant", "bee", "aren't", "allot");
foreach($search_words as $word)
{
if(Trie::search($root, $word) === true)
{
echo $word . " IS in my dictionary<br/>";
}
else
{
echo $word . " is NOT in my dictionary <br/>";
}
}

Related

My cookie class keeps returning null when getting the cookie

I am creating my own custom cookie class and I can not seem to figure out what I am doing wrong. Here is my cookie class:
<?php
class Cookie implements CookieHandlerInterface {
private $_domain;
private $_secure;
public function __construct(array $config = array()) {
$this->_domain = isset($config['domain']) ? $config['domain'] : 'localhost';
$this->_secure = isset($config['secure']) ? $config['secure'] : false;
}
public function set($name, $value = null, $timeLength) {
if (!is_null($value)) {
if (is_array($value)) {
if ($this->__isMultiArray($array)) {
return null;
} else {
$value = $this->__arrayBuild($value);
$value = 'array(' . $value . ')';
}
} elseif (is_bool($value)) {
if ($value) {
$value = 'bool(true)';
} else {
$value = 'bool(false)';
}
} elseif (is_int($value)) {
$value = 'int(' . strval($value) . ')';
} elseif (is_float($value)) {
$value = 'float(' . strval($value) . ')';
} elseif (is_string($value)) {
$value = 'string(' . $value . ')';
} else {
return null;
}
} else {
$value = 'null(null)';
}
setcookie($name, $value, (time() + $timeLength), '/', $this->_domain, $this->_secure, true);
}
public function get($name, $defualtOutput = null) {
if (isset($_COOKIE[$name])) {
$output = rtrim($_COOKIE[$name], ')');
$xr1 = mb_substr($output, 0, 1);
if (equals($xr1, 'a')) {
$output = ltrim($output, 'array(');
return $this->__arrayBreak($output);
}
if (equals($xr1, 'b')) {
$output = ltrim($output, 'bool(');
if (equals($output, 'true')) {
return true;
} else {
return false;
}
}
if (equals($xr1, 'i')) {
$output = ltrim($output, 'int(');
return (int) $output;
}
if (equals($xr1, 'f')) {
$output = ltrim($output, 'float(');
return (float) $output;
}
if (equals($xr1, 's')) {
$output = ltrim($output, 'string(');
return $output;
}
if (equals($output, 'null(null)')) {
return null;
}
}
if (
!is_array($defualtOutput)
&& !is_bool($defualtOutput)
&& !is_int($defualtOutput)
&& !is_float($defualtOutput)
&& !is_string($defualtOutput)
&& !is_null($defualtOutput)
) {
trigger_error(
'The $defualtOutput var needs to be only certain types of var types. Allowed (array, bool, int, float, string, null).',
E_USER_ERROR
);
}
return $defualtOutput;
}
public function delete($name) {
if (isset($_COOKIE[$name])) {
setcookie($name, '', time() - 3600, '/', $this->_domain, $this->_secure, true);
}
}
private function __arrayBuild($array) {
$out = '';
foreach ($array as $index => $data) {
$out .= ($data != '') ? $index . '=' . $data . '|' : '';
}
return rtrim($out, '|');
}
private function __arrayBreak($cookieString) {
$array = explode('|', $cookieString);
foreach ($array as $i => $stuff) {
$stuff = explode('=', $stuff);
$array[$stuff[0]] = $stuff[1];
unset($array[$i]);
}
return $array;
}
private function __isMultiArray($array) {
foreach ($array as $key => $value) {
if (is_array($value)) {
return true;
}
}
return false;
}
}
?>
I set a test cookie for example app('cookie')->set('test', 'hello', 0);
sure enough it created the cookie like expected. So the cookie reads string(hello)
When I try to echo it, it echos the default value instead of the actual variable, so app('cookie')->get('test', 'test'); returns test
The get function should check if the cookie exists with isset($_COOKIE[$cookieName]) and then it should trim the extra ) with rtrim($_COOKIE[$cookieName], ')') then it should grab the first character in the string with mb_substr($_COOKIE[$cookieName], 0, 1) the 0 starts at the beginning and the 1 grabs only the first character.
After it compares it with the default (a, b, i, f, s) for example if it starts with an s its a string by default, if it was i it was sent as an int by default, etc. etc.
If they all come up as false it checks to see if it was sent as null if so it return null else it returns the default value passed.
The equals function is the same as $var1 == $var2 it is timing attack safe.
so it keeps returning the default value which is null, any help would be helpful thanks in advance.
Lol i feel real stupid i put 0 as the third argument thinking it will tell the cookie to expire when the browser session closes, but it did (time() + 0) which does not equal 0. so as it was setting the cookie it expired upon creation. So i did time() - (time() * 2). i achieved the goal i wanted.

Converting string to array address

I have dynamically generated array $array[], that could be multidimensional and I have a function that return string, which contains address in array (existing one). My question is: how to create or convert string $a = 'array[1]' to address $array[1]?
Example:
$array = [1,2,3,4];
$string = 'array[2]';
function magic($array, $string){
//some magic happens
return $result;
$result = magic($array, $string);
echo $result;
// and 3 is displayed;
Is there a function already to do this? Is it possible to do this?
This code is a modification of ResponseBag::get() from the wonderful HttpFoundation project:
function magic($array, $path, $default = null)
{
if (false === $pos = strpos($path, '[')) {
return $array;
}
$value = $array;
$currentKey = null;
for ($i = $pos, $c = strlen($path); $i < $c; $i++) {
$char = $path[$i];
if ('[' === $char) {
if (null !== $currentKey) {
throw new \InvalidArgumentException(sprintf('Malformed path. Unexpected "[" at position %d.', $i));
}
$currentKey = '';
} elseif (']' === $char) {
if (null === $currentKey) {
throw new \InvalidArgumentException(sprintf('Malformed path. Unexpected "]" at position %d.', $i));
}
if (!is_array($value) || !array_key_exists($currentKey, $value)) {
return $default;
}
$value = $value[$currentKey];
$currentKey = null;
} else {
if (null === $currentKey) {
throw new \InvalidArgumentException(sprintf('Malformed path. Unexpected "%s" at position %d.', $char, $i));
}
$currentKey .= $char;
}
}
if (null !== $currentKey) {
throw new \InvalidArgumentException(sprintf('Malformed path. Path must end with "]".'));
}
return $value;
}
echo magic([1,2,3,4], 'array[2]'); // 3
It can be modified to return a reference as well, just sprinkle it with some ampersands :)

Rewrite config parameter with phalcon

Is it possible to rewrite config parameter with phalcon?
I use ini File Type.
If not - tell me how to implement please.
if you want to create ini file with php, it is possible, even w/o phalcon.
from PHP docs comments:
Read file : $ini = INI::read('myfile.ini');
Write file : INI::write('myfile.ini', $ini);
custom INI class Features :
support [] syntax for arrays
support . in keys like bar.foo.something = value
true and false string are automatically converted in booleans
integers strings are automatically converted in integers
keys are sorted when writing
constants are replaced but they should be written in the ini file between braces : {MYCONSTANT}
class INI {
/**
* WRITE
*/
static function write($filename, $ini) {
$string = '';
foreach(array_keys($ini) as $key) {
$string .= '['.$key."]\n";
$string .= INI::write_get_string($ini[$key], '')."\n";
}
file_put_contents($filename, $string);
}
/**
* write get string
*/
static function write_get_string(& $ini, $prefix) {
$string = '';
ksort($ini);
foreach($ini as $key => $val) {
if (is_array($val)) {
$string .= INI::write_get_string($ini[$key], $prefix.$key.'.');
} else {
$string .= $prefix.$key.' = '.str_replace("\n", "\\\n", INI::set_value($val))."\n";
}
}
return $string;
}
/**
* manage keys
*/
static function set_value($val) {
if ($val === true) { return 'true'; }
else if ($val === false) { return 'false'; }
return $val;
}
/**
* READ
*/
static function read($filename) {
$ini = array();
$lines = file($filename);
$section = 'default';
$multi = '';
foreach($lines as $line) {
if (substr($line, 0, 1) !== ';') {
$line = str_replace("\r", "", str_replace("\n", "", $line));
if (preg_match('/^\[(.*)\]/', $line, $m)) {
$section = $m[1];
} else if ($multi === '' && preg_match('/^([a-z0-9_.\[\]-]+)\s*=\s*(.*)$/i', $line, $m)) {
$key = $m[1];
$val = $m[2];
if (substr($val, -1) !== "\\") {
$val = trim($val);
INI::manage_keys($ini[$section], $key, $val);
$multi = '';
} else {
$multi = substr($val, 0, -1)."\n";
}
} else if ($multi !== '') {
if (substr($line, -1) === "\\") {
$multi .= substr($line, 0, -1)."\n";
} else {
INI::manage_keys($ini[$section], $key, $multi.$line);
$multi = '';
}
}
}
}
$buf = get_defined_constants(true);
$consts = array();
foreach($buf['user'] as $key => $val) {
$consts['{'.$key.'}'] = $val;
}
array_walk_recursive($ini, array('INI', 'replace_consts'), $consts);
return $ini;
}
/**
* manage keys
*/
static function get_value($val) {
if (preg_match('/^-?[0-9]$/i', $val)) { return intval($val); }
else if (strtolower($val) === 'true') { return true; }
else if (strtolower($val) === 'false') { return false; }
else if (preg_match('/^"(.*)"$/i', $val, $m)) { return $m[1]; }
else if (preg_match('/^\'(.*)\'$/i', $val, $m)) { return $m[1]; }
return $val;
}
/**
* manage keys
*/
static function get_key($val) {
if (preg_match('/^[0-9]$/i', $val)) { return intval($val); }
return $val;
}
/**
* manage keys
*/
static function manage_keys(& $ini, $key, $val) {
if (preg_match('/^([a-z0-9_-]+)\.(.*)$/i', $key, $m)) {
INI::manage_keys($ini[$m[1]], $m[2], $val);
} else if (preg_match('/^([a-z0-9_-]+)\[(.*)\]$/i', $key, $m)) {
if ($m[2] !== '') {
$ini[$m[1]][INI::get_key($m[2])] = INI::get_value($val);
} else {
$ini[$m[1]][] = INI::get_value($val);
}
} else {
$ini[INI::get_key($key)] = INI::get_value($val);
}
}
/**
* replace utility
*/
static function replace_consts(& $item, $key, $consts) {
if (is_string($item)) {
$item = strtr($item, $consts);
}
}
}
find out more here: http://lt1.php.net/parse_ini_file

Own implementation of StringTokenizer returns unexcepted (endless) output

I am implementing my own StringTokenizer class in php, because the strtok function can only handle one opened tokenizer at the same time.
With
Hello;this;is;a;text
it works perfectly.
The output is:
**Hello**
**this**
**is**
**a**
**text**
But with
Hello;this;is;a;text;
it outputs:
**Hello**
**this**
**is**
**a**
**text**
****
****
<endless loop>
But I except the following output:
**Hello**
**this**
**is**
**a**
**text**
****
See my code below and please correct me:
class StringTokenizer
{
private $_str;
private $_chToken;
private $_iPosToken = 0;
private $_bInit;
public function __construct($str, $chToken)
{
if (empty($str) && empty($chToken))
{
throw new Exception('String and the token char variables cannot be empty.');
}
elseif(empty($chToken) && !empty($str))
{
throw new Exception('Missing parameter: Token char cannot be empty.');
}
elseif(!empty($chToken) && empty($str))
{
throw new Exception('Missing parameter: String cannot be empty.');
}
elseif(!empty($chToken) && !empty($str) && is_string($str) && strlen($chToken) >= 0)
{
$this->_str = $str;
$this->_chToken = $chToken;
$this->_bInit = true;
}
else
{
throw new Exception('TypeError: Illegal call to __construct from class StringTokenizer.');
}
}
public function next()
{
if ($this->_iPosToken === false)
{
return false;
}
if ($this->_bInit === true && (strlen($this->_str) - 1) > $this->_iPosToken)
{
$iCh1stPos = strpos($this->_str, $this->_chToken, $this->_iPosToken) + 1;
$this->_iPosToken = $iCh1stPos;
$this->_bInit = false;
return substr($this->_str, 0, $this->_iPosToken - 1);
}
elseif ($this->_bInit === false && (strlen($this->_str) - 1) > $this->_iPosToken)
{
$iCh1stPos = $this->_iPosToken;
$iCh2ndPos = strpos($this->_str, $this->_chToken, $this->_iPosToken);
if ($iCh2ndPos === false)
{
$this->_iPosToken = false;
return substr($this->_str, $iCh1stPos);
}
else
{
$this->_iPosToken = $iCh2ndPos + 1;
return substr($this->_str, $iCh1stPos, $iCh2ndPos - $iCh1stPos);
}
}
}
public function hasNext()
{
return strpos($this->_str, $this->chToken, $this->_iPosToken) === false ? false : true;
}
}
$strText = 'Hello;this;is;a;text';
$tokenizer = new StringTokenizer($strText, ';');
$tok = $tokenizer->Next();
while ($tok !== false)
{
echo '**' . $tok . '**' . PHP_EOL;
$tok = $tokenizer->next();
}
exit(0);
The problem with the third condition in the next() is this. String length is 26 and the last character match is 26 which you represent with the _iPosToken. so the condition in the 3rd if is false and the block never executes for the last semicolon.
A function in php returns NULL not FALSE by default.source
and the while never terminates at the bottom of the code.
So you have two options here. change the condition in the 3rd if to (strlen($this->_str)) >= $this->_iPosToken
OR
add a 4th condtion which returns false, as shown below.
public function next()
{
if ($this->_iPosToken === false)
{
return false;
}
if ($this->_bInit === true && (strlen($this->_str) - 1) > $this->_iPosToken)
{
$iCh1stPos = strpos($this->_str, $this->_chToken, $this->_iPosToken) + 1;
$this->_iPosToken = $iCh1stPos;
$this->_bInit = false;
return substr($this->_str, 0, $this->_iPosToken - 1);
}
elseif ($this->_bInit === false && (strlen($this->_str)-1 ) > $this->_iPosToken)
{
$iCh1stPos = $this->_iPosToken;
echo $this->_iPosToken;
$iCh2ndPos = strpos($this->_str, $this->_chToken, $this->_iPosToken);
if ($iCh2ndPos === FALSE) // You can chuck this if block. I put a echo here and //it never executed.
{
$this->_iPosToken = false;
return substr($this->_str, $iCh1stPos);
}
else
{
$this->_iPosToken = $iCh2ndPos + 1;
return substr($this->_str, $iCh1stPos, $iCh2ndPos - $iCh1stPos);
}
}
else return false;
}
Why do you like reinvent the wheel ?
You can use explode function, and then implements Iterator pattern in this tokenizer, i think it's an good approach.
http://php.net/explode
http://br1.php.net/Iterator
Example
<?php
class StringTokenizer implements Iterator
{
private $tokens = [];
private $position = 0;
public function __construct($string, $separator)
{
$this->tokens = explode($separator, $string);
}
public function rewind()
{
$this->position = 0;
}
public function current()
{
return $this->tokens[$this->position];
}
public function next()
{
++ $this->position;
}
public function key()
{
return $this->position;
}
public function valid()
{
return isset($this->tokens[$this->position]);
}
}
And using it:
$tokenizer = new StringTokenizer('h;e;l;l;o;', ';');
while($tokenizer->valid()) {
printf('**%s**', $tokenizer->current());
$tokenizer->next();
}

php imap - get body and make plain text

I am using the PHP imap function to get emails from a POP3 mailbox and insert the data into a MySQL database.
Here is the PHP code:
$inbox = imap_open($hostname,$username,$password) or die('Cannot connect: ' . imap_last_error());
$emails = imap_search($inbox,'ALL');
if($emails)
{
$output = '';
rsort($emails);
foreach($emails as $email_number)
{
$header=imap_headerinfo($inbox,$email_number);
$from = $header->from[0]->mailbox . "#" . $header->from[0]->host;
$toaddress=$header->toaddress;
$replyto=$header->reply_to[0]->mailbox."#".$header->reply_to[0]->host;
$datetime=date("Y-m-d H:i:s",$header->udate);
$subject=$header->subject;
//remove the " from the $toaddress
$toaddress = str_replace('"','',$toaddress);
echo '<strong>To:</strong> '.$toaddress.'<br>';
echo '<strong>From:</strong> '.$from.'<br>';
echo '<strong>Subject:</strong> '.$subject.'<br>';
//get message body
$message = (imap_fetchbody($inbox,$email_number,1.1));
if($message == '')
{
$message = (imap_fetchbody($inbox,$email_number,1));
}
}
It works fine, however on some emails in the body I get = in between words, or =20 in between words. And other times the emails will just be blank even though they are not blank when sent.
This only happens when coming from certain emails.
How can I get round this and just make the email completely plain text?
This happens because the emails are normally Quoted-printable encoded. The = is a soft line break and =20 is a white space. I think, you could use quoted_printable_decode() on the message so it shows correctly. About the blank emails, I don't know, I would need more details.
Basically:
//get message body
$message = quoted_printable_decode(imap_fetchbody($inbox,$email_number,1.1));
$data = imap_fetchbody($this->imapStream, $Part->uid, $Part->path, FT_UID | FT_PEEK);
if ($Part->format === 'quoted-printable' && $data) {
$data = quoted_printable_decode($data);
}
This is required for mails with
Content-Transfer-Encoding: quoted-printable
But for mails with
Content-Transfer-Encoding: 8bit
simply imap_fetchbody is enough.
Above code was taken from a cake-php component created for fetching mails from mail boxes throgh IMAP.
I made an entire class some years ago, and I'm still using it when I need to get contents from emails. It will help you fetch all email bodies (sometimes you have html and plain text) in a readable format, and get all attached files, just ready to be saved somewhere or sent to a website user.
It is not really optimized so on a big mailbox you may have troubles; but the purpose of this class was to access emails in a readable format to put them on a widget of a website. I let you play with the sample below to get how it works.
ImapReader.class.php Here is the source code.
<?php
class ImapReader
{
private $host;
private $port;
private $user;
private $pass;
private $box;
private $box_list;
private $errors;
private $connected;
private $list;
private $deleted;
const FROM = 0;
const TO = 1;
const REPLY_TO = 2;
const SUBJECT = 3;
const CONTENT = 4;
const ATTACHMENT = 5;
public function __construct($host = null, $port = '143', $user = null, $pass = null)
{
$this->host = $host;
$this->port = $port;
$this->user = $user;
$this->pass = $pass;
$this->box = null;
$this->box_list = null;
$this->errors = array ();
$this->connected = false;
$this->list = null;
$this->deleted = false;
}
public function __destruct()
{
if ($this->isConnected())
{
$this->disconnect();
}
}
public function changeServer($host = null, $port = '143', $user = null, $pass = null)
{
if ($this->isConnected())
{
$this->disconnect();
}
$this->host = $host;
$this->port = $port;
$this->user = $user;
$this->pass = $pass;
$this->box_list = null;
$this->errors = array ();
$this->list = null;
return $this;
}
public function canConnect()
{
return (($this->connected == false) && (is_string($this->host)) && (!empty($this->host))
&& (is_numeric($this->port)) && ($this->port >= 1) && ($this->port <= 65535)
&& (is_string($this->user)) && (!empty($this->user)) && (is_string($this->pass)) && (!empty($this->pass)));
}
public function connect()
{
if ($this->canConnect())
{
$this->box = #imap_open("{{$this->host}:{$this->port}/imap/ssl/novalidate-cert}INBOX", $this->user,
$this->pass);
if ($this->box !== false)
{
$this->_connected();
}
else
{
$this->errors = array_merge($this->errors, imap_errors());
}
}
return $this;
}
public function boxList()
{
if (is_null($this->box_list))
{
$list = imap_getsubscribed($this->box, "{{$this->host}:{$this->port}}", "*");
$this->box_list = array ();
foreach ($list as $box)
{
$this->box_list[] = $box->name;
}
}
return $this->box_list;
}
public function fetchAllHeaders($mbox)
{
if ($this->isConnected())
{
$test = imap_reopen($this->box, "{$mbox}");
if (!$test)
{
return false;
}
$num_msgs = imap_num_msg($this->box);
$this->list = array ();
for ($id = 1; ($id <= $num_msgs); $id++)
{
$this->list[] = $this->_fetchHeader($mbox, $id);
}
return true;
}
return false;
}
public function fetchSearchHeaders($mbox, $criteria)
{
if ($this->isConnected())
{
$test = imap_reopen($this->box, "{$mbox}");
if (!$test)
{
return false;
}
$msgs = imap_search($this->box, $criteria);
if ($msgs)
{
foreach ($msgs as $id)
{
$this->list[] = $this->_fetchHeader($mbox, $id);
}
}
return true;
}
return false;
}
public function isConnected()
{
return $this->connected;
}
public function disconnect()
{
if ($this->connected)
{
if ($this->deleted)
{
imap_expunge($this->box);
$this->deleted = false;
}
imap_close($this->box);
$this->connected = false;
$this->box = null;
}
return $this;
}
/**
* Took from khigashi dot oang at gmail dot com at php.net
* with replacement of ereg family functions by preg's ones.
*
* #param string $str
* #return string
*/
private function _fix($str)
{
if (preg_match("/=\?.{0,}\?[Bb]\?/", $str))
{
$str = preg_split("/=\?.{0,}\?[Bb]\?/", $str);
while (list($key, $value) = each($str))
{
if (preg_match("/\?=/", $value))
{
$arrTemp = preg_split("/\?=/", $value);
$arrTemp[0] = base64_decode($arrTemp[0]);
$str[$key] = join("", $arrTemp);
}
}
$str = join("", $str);
}
if (preg_match("/=\?.{0,}\?Q\?/", $str))
{
$str = quoted_printable_decode($str);
$str = preg_replace("/=\?.{0,}\?[Qq]\?/", "", $str);
$str = preg_replace("/\?=/", "", $str);
}
return trim($str);
}
private function _connected()
{
$this->connected = true;
return $this;
}
public function getErrors()
{
$errors = $this->errors;
$this->errors = array ();
return $errors;
}
public function count()
{
if (is_null($this->list))
{
return 0;
}
return count($this->list);
}
public function get($nbr = null)
{
if (is_null($nbr))
{
return $this->list;
}
if ((is_array($this->list)) && (isset($this->list[$nbr])))
{
return $this->list[$nbr];
}
return null;
}
public function fetch($nbr = null)
{
return $this->_callById('_fetch', $nbr);
}
private function _fetchHeader($mbox, $id)
{
$header = imap_header($this->box, $id);
if (!is_object($header))
{
return;
}
$mail = new stdClass();
$mail->id = $id;
$mail->mbox = $mbox;
$mail->timestamp = (isset($header->udate)) ? ($header->udate) : ('');
$mail->date = date("d/m/Y H:i:s", (isset($header->udate)) ? ($header->udate) : (''));
$mail->from = $this->_fix(isset($header->fromaddress) ? ($header->fromaddress) : (''));
$mail->to = $this->_fix(isset($header->toaddress) ? ($header->toaddress) : (''));
$mail->reply_to = $this->_fix(isset($header->reply_toaddress) ? ($header->reply_toaddress) : (''));
$mail->subject = $this->_fix(isset($header->subject) ? ($header->subject) : (''));
$mail->content = array ();
$mail->attachments = array ();
$mail->deleted = false;
return $mail;
}
private function _fetch($mail)
{
$test = imap_reopen($this->box, "{$mail->mbox}");
if (!$test)
{
return $mail;
}
$structure = imap_fetchstructure($this->box, $mail->id);
if ((!isset($structure->parts)) || (!is_array($structure->parts)))
{
$body = imap_body($this->box, $mail->id);
$content = new stdClass();
$content->type = 'content';
$content->mime = $this->_fetchType($structure);
$content->charset = $this->_fetchParameter($structure->parameters, 'charset');
$content->data = $this->_decode($body, $structure->type);
$content->size = strlen($content->data);
$mail->content[] = $content;
return $mail;
}
else
{
$parts = $this->_fetchPartsStructureRoot($mail, $structure);
foreach ($parts as $part)
{
$content = new stdClass();
$content->type = null;
$content->data = null;
$content->mime = $this->_fetchType($part->data);
if ((isset($part->data->disposition))
&& ((strcmp('attachment', $part->data->disposition) == 0)
|| (strcmp('inline', $part->data->disposition) == 0)))
{
$content->type = $part->data->disposition;
$content->name = null;
if (isset($part->data->dparameters))
{
$content->name = $this->_fetchParameter($part->data->dparameters, 'filename');
}
if (is_null($content->name))
{
if (isset($part->data->parameters))
{
$content->name = $this->_fetchParameter($part->data->parameters, 'name');
}
}
$mail->attachments[] = $content;
}
else if ($part->data->type == 0)
{
$content->type = 'content';
$content->charset = null;
if (isset($part->data->parameters))
{
$content->charset = $this->_fetchParameter($part->data->parameters, 'charset');
}
$mail->content[] = $content;
}
$body = imap_fetchbody($this->box, $mail->id, $part->no);
if (isset($part->data->encoding))
{
$content->data = $this->_decode($body, $part->data->encoding);
}
else
{
$content->data = $body;
}
$content->size = strlen($content->data);
}
}
return $mail;
}
private function _fetchPartsStructureRoot($mail, $structure)
{
$parts = array ();
if ((isset($structure->parts)) && (is_array($structure->parts)) && (count($structure->parts) > 0))
{
foreach ($structure->parts as $key => $data)
{
$this->_fetchPartsStructure($mail, $data, ($key + 1), $parts);
}
}
return $parts;
}
private function _fetchPartsStructure($mail, $structure, $prefix, &$parts)
{
if ((isset($structure->parts)) && (is_array($structure->parts)) && (count($structure->parts) > 0))
{
foreach ($structure->parts as $key => $data)
{
$this->_fetchPartsStructure($mail, $data, $prefix . "." . ($key + 1), $parts);
}
}
$part = new stdClass;
$part->no = $prefix;
$part->data = $structure;
$parts[] = $part;
}
private function _fetchParameter($parameters, $key)
{
foreach ($parameters as $parameter)
{
if (strcmp($key, $parameter->attribute) == 0)
{
return $parameter->value;
}
}
return null;
}
private function _fetchType($structure)
{
$primary_mime_type = array ("TEXT", "MULTIPART", "MESSAGE", "APPLICATION", "AUDIO", "IMAGE", "VIDEO", "OTHER");
if ((isset($structure->subtype)) && ($structure->subtype) && (isset($structure->type)))
{
return $primary_mime_type[(int) $structure->type] . '/' . $structure->subtype;
}
return "TEXT/PLAIN";
}
private function _decode($message, $coding)
{
switch ($coding)
{
case 2:
$message = imap_binary($message);
break;
case 3:
$message = imap_base64($message);
break;
case 4:
$message = imap_qprint($message);
break;
case 5:
break;
default:
break;
}
return $message;
}
private function _callById($method, $data)
{
$callback = array ($this, $method);
// data is null
if (is_null($data))
{
$result = array ();
foreach ($this->list as $mail)
{
$result[] = $this->_callById($method, $mail);
}
return $result;
}
// data is an array
if (is_array($data))
{
$result = array ();
foreach ($data as $elem)
{
$result[] = $this->_callById($method, $elem);
}
return $result;
}
// data is an object
if ((is_object($data)) && ($data instanceof stdClass) && (isset($data->id)))
{
return call_user_func($callback, $data);
}
// data is numeric
if (($this->isConnected()) && (is_array($this->list)) && (is_numeric($data)))
{
foreach ($this->list as $mail)
{
if ($mail->id == $data)
{
return call_user_func($callback, $mail);
}
}
}
return null;
}
public function delete($nbr)
{
$this->_callById('_delete', $nbr);
return;
}
private function _delete($mail)
{
if ($mail->deleted == false)
{
$test = imap_reopen($this->box, "{$mail->mbox}");
if ($test)
{
$this->deleted = true;
imap_delete($this->box, $mail->id);
$mail->deleted = true;
}
}
}
public function searchBy($pattern, $type)
{
$result = array ();
if (is_array($this->list))
{
foreach ($this->list as $mail)
{
$match = false;
switch ($type)
{
case self::FROM:
$match = $this->_match($mail->from, $pattern);
break;
case self::TO:
$match = $this->_match($mail->to, $pattern);
break;
case self::REPLY_TO:
$match = $this->_match($mail->reply_to, $pattern);
break;
case self::SUBJECT:
$match = $this->_match($mail->subject, $pattern);
break;
case self::CONTENT:
foreach ($mail->content as $content)
{
$match = $this->_match($content->data, $pattern);
if ($match)
{
break;
}
}
break;
case self::ATTACHMENT:
foreach ($mail->attachments as $attachment)
{
$match = $this->_match($attachment->name, $pattern);
if ($match)
{
break;
}
}
break;
}
if ($match)
{
$result[] = $mail;
}
}
}
return $result;
}
private function _nmatch($string, $pattern, $a, $b)
{
if ((!isset($string[$a])) && (!isset($pattern[$b])))
{
return 1;
}
if ((isset($pattern[$b])) && ($pattern[$b] == '*'))
{
if (isset($string[$a]))
{
return ($this->_nmatch($string, $pattern, ($a + 1), $b) + $this->_nmatch($string, $pattern, $a, ($b + 1)));
}
else
{
return ($this->_nmatch($string, $pattern, $a, ($b + 1)));
}
}
if ((isset($string[$a])) && (isset($pattern[$b])) && ($pattern[$b] == '?'))
{
return ($this->_nmatch($string, $pattern, ($a + 1), ($b + 1)));
}
if ((isset($string[$a])) && (isset($pattern[$b])) && ($pattern[$b] == '\\'))
{
if ((isset($pattern[($b + 1)])) && ($string[$a] == $pattern[($b + 1)]))
{
return ($this->_nmatch($string, $pattern, ($a + 1), ($b + 2)));
}
}
if ((isset($string[$a])) && (isset($pattern[$b])) && ($string[$a] == $pattern[$b]))
{
return ($this->_nmatch($string, $pattern, ($a + 1), ($b + 1)));
}
return 0;
}
private function _match($string, $pattern)
{
return $this->_nmatch($string, $pattern, 0, 0);
}
}
ImapReader.demo.php Here is the usage sample
<?php
require_once("ImapReader.class.php");
$box = new ImapReader('example.com', '143', 'somebody#example.com', 'xxxxxxxxxxxx');
$box
->connect()
->fetchAllHeaders()
;
echo $box->count() . " emails in mailbox\n";
for ($i = 0; ($i < $box->count()); $i++)
{
$msg = $box->get($i);
echo "Reception date : {$msg->date}\n";
echo "From : {$msg->from}\n";
echo "To : {$msg->to}\n";
echo "Reply to : {$msg->from}\n";
echo "Subject : {$msg->subject}\n";
$msg = $box->fetch($msg);
echo "Number of readable contents : " . count($msg->content) . "\n";
foreach ($msg->content as $key => $content)
{
echo "\tContent " . ($key + 1) . " :\n";
echo "\t\tContent type : {$content->mime}\n";
echo "\t\tContent charset : {$content->charset}\n";
echo "\t\tContent size : {$content->size}\n";
}
echo "Number of attachments : " . count($msg->attachments) . "\n";
foreach ($msg->attachments as $key => $attachment)
{
echo "\tAttachment " . ($key + 1) . " :\n";
echo "\t\tAttachment type : {$attachment->type}\n";
echo "\t\tContent type : {$attachment->mime}\n";
echo "\t\tFile name : {$attachment->name}\n";
echo "\t\tFile size : {$attachment->size}\n";
}
echo "\n";
}
echo "Searching '*Bob*' ...\n";
$results = $box->searchBy('*Bob*', ImapReader::FROM);
foreach ($results as $result)
{
echo "\tMatched: {$result->from} - {$result->date} - {$result->subject}\n";
}
Enjoy
Regarding the blank emails, check the encoding of the mail.
If it is a binary encoded mail then you will get blank mails when you try to insert them into a mysql text field.
Try shifting every mail to UTF-8 and then insert it
iconv(mb_detect_encoding($mail_content, mb_detect_order(), true), "UTF-8", $mail_content);
function getmsg($mbox,$mid) {
// input $mbox = IMAP stream, $mid = message id
// output all the following:
global $charset,$htmlmsg,$plainmsg,$attachments;
$htmlmsg = $plainmsg = $charset = '';
$attachments = array();
// HEADER
$h = imap_header($mbox,$mid);
// add code here to get date, from, to, cc, subject...
// BODY
$s = imap_fetchstructure($mbox,$mid);
if (!$s->parts) // simple
getpart($mbox,$mid,$s,0); // pass 0 as part-number
else { // multipart: cycle through each part
foreach ($s->parts as $partno0=>$p)
getpart($mbox,$mid,$p,$partno0+1);
}
}
function getpart($mbox,$mid,$p,$partno) {
// $partno = '1', '2', '2.1', '2.1.3', etc for multipart, 0 if simple
global $htmlmsg,$plainmsg,$charset,$attachments;
// DECODE DATA
$data = ($partno)?
imap_fetchbody($mbox,$mid,$partno): // multipart
imap_body($mbox,$mid); // simple
// Any part may be encoded, even plain text messages, so check everything.
if ($p->encoding==4)
$data = quoted_printable_decode($data);
elseif ($p->encoding==3)
$data = base64_decode($data);
// PARAMETERS
// get all parameters, like charset, filenames of attachments, etc.
$params = array();
if ($p->parameters)
foreach ($p->parameters as $x)
$params[strtolower($x->attribute)] = $x->value;
if ($p->dparameters)
foreach ($p->dparameters as $x)
$params[strtolower($x->attribute)] = $x->value;
// ATTACHMENT
// Any part with a filename is an attachment,
// so an attached text file (type 0) is not mistaken as the message.
if ($params['filename'] || $params['name']) {
// filename may be given as 'Filename' or 'Name' or both
$filename = ($params['filename'])? $params['filename'] : $params['name'];
// filename may be encoded, so see imap_mime_header_decode()
$attachments[$filename] = $data; // this is a problem if two files have same name
}
// TEXT
if ($p->type==0 && $data) {
// Messages may be split in different parts because of inline attachments,
// so append parts together with blank row.
if (strtolower($p->subtype)=='plain')
$plainmsg. = trim($data) ."\n\n";
else
$htmlmsg. = $data ."<br><br>";
$charset = $params['charset']; // assume all parts are same charset
}
// EMBEDDED MESSAGE
// Many bounce notifications embed the original message as type 2,
// but AOL uses type 1 (multipart), which is not handled here.
// There are no PHP functions to parse embedded messages,
// so this just appends the raw source to the main message.
elseif ($p->type==2 && $data) {
$plainmsg. = $data."\n\n";
}
// SUBPART RECURSION
if ($p->parts) {
foreach ($p->parts as $partno0=>$p2)
getpart($mbox,$mid,$p2,$partno.'.'.($partno0+1)); // 1.2, 1.2.1, etc.
}
}
Reference (First user contributed note): http://php.net/manual/en/function.imap-fetchstructure.php
I tried all this answers, but neither one worked for me. Then I hit first user contributed note on this PHP page:
http://php.net/manual/en/function.imap-fetchstructure.php
and this works for all my cases. Quite old answer btw.

Categories