PHP Library to Parse Mobi - php

Is there any freely available library for PHP which parses a .mobi file to get the:
Author
Title
Publisher
Cover
Edit:
To everyone who thinks this is an exact duplicate of Does a PHP Library Exist to Work with PRC/MOBI Files, you're obviously too lazy to read the questions.
That asker wants to know how to generate .mobi files using a PHP library. I want to know how to break apart, or parse, already created .mobi files to get certain information. Therefore, the solution to that question, phpMobi will not work because it is a script to generate .mobi files from HTML, not to parse .mobi files.

A very very very lame example, but if you get desperate, you may try something like this:
$data = file_get_contents("A Young Girl's Diary - Freud, Sigmund.mobi");
$chunk = mb_substr($data, mb_strpos($data, 'EXTH'), 512);
$chunks = explode("\x00", $chunk);
array_shift($chunks);
$chunks = array_filter($chunks, function($str){return preg_match('#([A-Z])#', $str) && mb_strlen($str) > 2;});
$chunks = array_combine(array('author', 'publisher', 'title'), $chunks);
print_r($chunks);
Output:
Array
(
[author] => Freud, Sigmund
[publisher] => Webarto
[title] => A Young Girl's Diary
)
File used: http://freekindlebooks.org/Freud/752-h.mobi (edited Publisher metadata with Calibre)
File parsing is not even remotely easy or fun thing to do. Just take a look at this: http://code.google.com/p/xee/source/browse/XeePhotoshopLoader.m?r=a70d7396356997114b548f4ab2cbd49badd7d285#107
What you should be doing is reading byte by byte, but because there is no detailed documentation, I'm afraid that won't be an easy job.
P.S. I haven't tried to fetch cover photo.

If someone is still interested here's a sample of mobi metadata reading:
class palmDOCHeader
{
public $Compression = 0;
public $TextLength = 0;
public $Records = 0;
public $RecordSize = 0;
}
class palmHeader
{
public $Records = array();
}
class palmRecord
{
public $Offset = 0;
public $Attributes = 0;
public $Id = 0;
}
class mobiHeader
{
public $Length = 0;
public $Type = 0;
public $Encoding = 0;
public $Id = 0;
public $FileVersion = 0;
}
class exthHeader
{
public $Length = 0;
public $Records = array();
}
class exthRecord
{
public $Type = 0;
public $Length = 0;
public $Data = "";
}
class mobi {
protected $mobiHeader;
protected $exthHeader;
public function __construct($file){
$handle = fopen($file, "r");
if ($handle){
fseek($handle, 60, SEEK_SET);
$content = fread($handle, 8);
if ($content != "BOOKMOBI"){
echo "Invalid file format";
fclose($handle);
return;
}
// Palm Database
echo "\nPalm database:\n";
$palmHeader = new palmHeader();
fseek($handle, 0, SEEK_SET);
$name = fread($handle, 32);
echo "Name: ".$name."\n";
fseek($handle, 76, SEEK_SET);
$content = fread($handle, 2);
$records = hexdec(bin2hex($content));
echo "Records: ".$records."\n";
fseek($handle, 78, SEEK_SET);
for ($i=0; $i<$records; $i++){
$record = new palmRecord();
$content = fread($handle, 4);
$record->Offset = hexdec(bin2hex($content));
$content = fread($handle, 1);
$record->Attributes = hexdec(bin2hex($content));
$content = fread($handle, 3);
$record->Id = hexdec(bin2hex($content));
array_push($palmHeader->Records, $record);
echo "Record ".$i." offset: ".$record->Offset." attributes: ".$record->Attributes." id : ".$record->Id."\n";
}
// PalmDOC Header
$palmDOCHeader = new palmDOCHeader();
fseek($handle, $palmHeader->Records[0]->Offset, SEEK_SET);
$content = fread($handle, 2);
$palmDOCHeader->Compression = hexdec(bin2hex($content));
$content = fread($handle, 2);
$content = fread($handle, 4);
$palmDOCHeader->TextLength = hexdec(bin2hex($content));
$content = fread($handle, 2);
$palmDOCHeader->Records = hexdec(bin2hex($content));
$content = fread($handle, 2);
$palmDOCHeader->RecordSize = hexdec(bin2hex($content));
$content = fread($handle, 4);
echo "\nPalmDOC Header:\n";
echo "Compression:".$palmDOCHeader->Compression."\n";
echo "TextLength:".$palmDOCHeader->TextLength."\n";
echo "Records:".$palmDOCHeader->Records."\n";
echo "RecordSize:".$palmDOCHeader->RecordSize."\n";
// MOBI Header
$mobiStart = ftell($handle);
$content = fread($handle, 4);
if ($content == "MOBI"){
$this->mobiHeader = new mobiHeader();
echo "\nMOBI header:\n";
$content = fread($handle, 4);
$this->mobiHeader->Length = hexdec(bin2hex($content));
$content = fread($handle, 4);
$this->mobiHeader->Type = hexdec(bin2hex($content));
$content = fread($handle, 4);
$this->mobiHeader->Encoding = hexdec(bin2hex($content));
$content = fread($handle, 4);
$this->mobiHeader->Id = hexdec(bin2hex($content));
echo "Header length: ".$this->mobiHeader->Length."\n";
echo "Type: ".$this->mobiHeader->Type."\n";
echo "Encoding: ".$this->mobiHeader->Encoding."\n";
echo "Id: ".$this->mobiHeader->Id."\n";
fseek($handle, $mobiStart+$this->mobiHeader->Length, SEEK_SET);
$content = fread($handle, 4);
if ($content == "EXTH"){
$this->exthHeader = new exthHeader();
echo "\nEXTH header:\n";
$content = fread($handle, 4);
$this->exthHeader->Length = hexdec(bin2hex($content));
$content = fread($handle, 4);
$records = hexdec(bin2hex($content));
echo "Records: ".$records."\n";
for ($i=0; $i<$records; $i++){
$record = new exthRecord();
$content = fread($handle, 4);
$record->Type = hexdec(bin2hex($content));
$content = fread($handle, 4);
$record->Length = hexdec(bin2hex($content));
$record->Data = fread($handle, $record->Length - 8);
array_push($this->exthHeader->Records, $record);
echo "Record ".$i." type: ".$record->Type." length: ".$record->Length."\n";
echo " data: ".$record->Data."\n";
}
}
}
fclose($handle);
}
}
protected function GetRecord($type)
{
foreach ($this->exthHeader->Records as $record){
if ($record->Type == $type)
return $record;
}
return NULL;
}
protected function GetRecordData($type)
{
$record = $this->GetRecord($type);
if ($record)
return $record->Data;
return "";
}
public function Title()
{
return $this->GetRecordData(503);
}
public function Author()
{
return $this->GetRecordData(100);
}
public function Isbn()
{
return $this->GetRecordData(104);
}
public function Subject()
{
return $this->GetRecordData(105);
}
public function Publisher()
{
return $this->GetRecordData(101);
}
}
$mobi = new mobi("test.mobi");
echo "\nTitle: ".$mobi->Title();
echo "\nAuthor: ".$mobi->Author();
echo "\nIsbn: ".$mobi->Isbn();
echo "\nSubject: ".$mobi->Subject();
echo "\nPublisher: ".$mobi->Publisher();

Had the same issue, didn't find any of PHP parsers, had to write my own(unfortunately I can't disclose my code). Here is a good resource about .mobi structure http://wiki.mobileread.com/wiki/MOBI

Related

how to replace a string in a stream for very large files

How can I replace a string in a file that cannot be fully loaded into memory
I can read it a few byes at a time, but how can I be sure I didn't read into the middle of my phrase?
I think I should save the last strlen(phrase) length of bytes and try to replace last+current
This is my WIP
function stream_str_replace(string $search, string $replace, $handle, int $length, &$count = null)
{
// assure $handle is a resource
if (!is_resource($handle)) {
throw new UnexpectedValueException('handle must be a valid stream resource');
}
// assure $handle is a stream resource
if ($resourceType = get_resource_type($handle) !== 'stream') {
throw new UnexpectedValueException('handle must be a valid stream resource, but is a "' . $resourceType . '"');
}
$sLength = strlen($search);
$lastInSLength = '';
while (!feof($handle)) {
$str = fread($handle, $length - $sLength - 1);
$batchCount = 0;
$res = str_replace($search, $replace, $lastInSLength . $str, $batchCount);
if ($batchCount) {
$count += $batchCount;
fseek($handle, -($length - 1));
fwrite($handle, $res); // this does not seem to work as I intend it to
}
$lastInSLength = substr($str, -$sLength);
}
}
$fh = fopen('sample.txt', 'r+');
stream_str_replace('consectetur', 'foo', $fh, 50, $count);
fclose($fh);

Saving hex data in binary using PHP does not work properly

I am learning PHP and files and I am trying to write some code that put data in a binary file.
here's my code:
Write
<?php
echo "\n\nWRITE: \n\n";
$c = array();
$data = '';
$c['name'] = 'abcdefghijklmnopqrstuvwxyz';
$data .= implode('', $c);
$fp = fopen('test.bin', 'wb');
$len = strlen($data);
echo "\nFILE CONTENT: $data (strlen: $len)\n\n";
for ($i = 0; $i < $len; ++$i) {
$hx = dechex(ord($data{$i}));
fwrite($fp, pack("C", $hx));
}
echo "Last char is: $hx which mean: ";
echo chr(hexdec('7a'));
echo "\n--------------------------------------------\n";
fclose($fp);
Output
FILE CONTENT: abcdefghijklmnopqrstuvwxyz (strlen: 26)
Last char is: 7a which mean: z
Read
<?php
echo "\n--------------------------------------------\n";
echo "\n\nREAD: \n\n";
$fp = fopen('test.bin', 'rb');
$fseek = fseek($fp, 0, SEEK_SET);
if($fseek == -1) {
return FALSE;
}
$data = fread($fp, 26);
$arr = unpack("C*", $data);
$return = '';
foreach($arr as $val) {
$return .= chr(hexdec($val));
}
$n = '';
$arr = array();
$arr['name'] = substr($return, 0, 26);
print_r($arr);
echo "\n--------------------------------------------\n";
Output
Array
(
[name] => abcdefghipqrstuvwxy
)
Where are the missing letters like the z, m, n or o ?
EDIT 6-3-14 7h36 am: I would like to have the .bin file not plain text if possible
You are trying to set HEX chars in a char (C - unsigned char) instruction.
echo "\t";
foreach( array('0x41', 65, 'a') as $o )
echo $o."\t";
echo "\n";
foreach( array('c*','C*','a*','A*','h*','H*','v*','n*','S*') as $o ){
echo $o . "\t";
foreach( array(0x41, 65, "a") as $oo ) {
echo pack($o, $oo);
echo "\t";
}
echo "\n";
}
If you run this, you will see quickly how pack works with the 3 different values of a (HEX, DEC and normal).
You have to use the h instruction to accomplish what you need.
function writeToFile($data) {
$fp = fopen(FILENAME, 'wb');
$len = strlen($data);
for ($i = 0; $i < $len; ++$i) {
$hx = dechex(ord($data[$i]));
$result = fwrite($fp, pack("h*", $hx));
if(!$result) {
// show something
}
}
fclose($fp);
}
Now, for read that data. You will need to use the same one h and split the string you get back (split it using str_split with the parameter 2 since it's HEX 00 = 0 and FF = 255 - assuming you won't go over 255). Since h returns an array with a single element. Once you get your string back, you need to convert the number you get from the ord in the writeToFile using the chr function.
function readFromFile($lenght, $pos = 0) {
$return = '';
$fp = fopen(FILENAME, 'rb');
if(!$fp) {
// show something
}
$fseek = fseek($fp, $pos, SEEK_SET);
if($fseek == -1) {
// show something
}
$data = fread($fp, $lenght);
$data = unpack("h*", $data);
$arr = str_split(current($data), 2);
foreach($arr as $val) {
$return .= chr(hexdec($val));
}
return $return;
}
Now, you create your string and write to the file:
$data = 'This should work properly, thanks for StackOverFlow!';
$len = strlen($data);
writeToFile($data);
Then read back:
echo readFromFile($len);
The content of your file will look like this:
E<86><96>7^B7<86>öWÆF^Bwö'¶^B^G'ö^GV'Æ<97>Â^BG<86>^Væ¶7^Bfö'^B5G^V6¶ôgV'dÆöw^R

Read multiple csv files in folder

i need a help ^^
What i need is script which will open and read all .csv files in folder 'csv/files' and then do that thing in "if". Well, when i had only one file it worked fine. I managed to construct some script which is not working but no "error line" popping up either ...
So can somebody look at my code and tell me what i am doing wrong ?
<?php
foreach (glob("*.csv") as $filename) {
echo $filename."<br />";
if (($handle = fopen($filename, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$url = $data[0];
$path = $data[1];
$ch = curl_init($url);
$fp = fopen($path, 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
}
fclose($handle);
}
}
?>
This is a prime candidate for multi-threading, and here's some code to do it:
<?php
class WebWorker extends Worker {
public function run() {}
}
class WebTask extends Stackable {
public function __construct($input, $output) {
$this->input = $input;
$this->output = $output;
$this->copied = 0;
}
public function run() {
$data = file_get_contents($this->input);
if ($data) {
file_put_contents(
$this->output, $data);
$this->copied = strlen($data);
}
}
public $input;
public $output;
public $copied;
}
class WebPool {
public function __construct($max) {
$this->max = $max;
$this->workers = [];
}
public function submit(WebTask $task) {
$random = rand(0, $this->max);
if (isset($this->workers[$random])) {
return $this->workers[$random]
->stack($task);
} else {
$this->workers[$random] = new WebWorker();
$this->workers[$random]
->start();
return $this->workers[$random]
->stack($task);
}
}
public function shutdown() {
foreach ($this->workers as $worker)
$worker->shutdown();
}
protected $max;
protected $workers;
}
$pool = new WebPool(8);
$work = [];
$start = microtime(true);
foreach (glob("csv/*.csv") as $file) {
$file = fopen($file, "r");
if ($file) {
while (($line = fgetcsv($file, 0, ";"))) {
$wid = count($work);
$work[$wid] = new WebTask(
$line[0], $line[1]);
$pool->submit($work[$wid]);
}
}
}
$pool->shutdown();
$runtime = microtime(true) - $start;
$total = 0;
foreach ($work as $job) {
printf(
"[%s] %s -> %s %.3f kB\n",
$job->copied ? "OK" : "FAIL",
$job->input,
$job->output,
$job->copied/1024);
$total += $job->copied;
}
printf(
"[TOTAL] %.3f kB in %.3f seconds\n",
$total/1024, $runtime);
?>
This will create a maximum number of pooled threads, it will then read through a directory of semi-colon seperated csv files where each line is input;output, it will then submit the task to read the input and write the output asynchronously to the pool for execution, while the main thread continues to read csv files.
I have used the simplest input/output file_get_contents and file_put_contents so that you can see how it works without cURL.
The worker selected when a task is submitted to the pool is random, this may not be desirable, it's possible to detect if a worker is busy but this would complicate the example.
Further reading:
https://gist.github.com/krakjoe/6437782
http://php.net/pthreads

Convert php output to JSON

I have the following output from a PHP script:
one#gmail.com test1
two#gmail.com test2
which is generated from the following:
$i = 0;
$header = array();
while (!feof($handle)) {
$buffer = fgets($handle, $chunk_size);
if (trim($buffer)!=''){
$obj = json_decode($buffer);
echo $obj[0]." ".$obj[2]."<br>";
$i++;
}
}
fclose($handle);
How can i convert the output of the script into a JSON format of :
{"emails":[{"email":"one#gmail.com","option":test1"},{"email":"two#gmail.com","option":test2"}]}
The script was taken from the Mailchimp API which list the subscribers of a list.
Here is the script for reference:
<?php
$apikey = '1234-us7';
$list_id = '1234';
$chunk_size = 4096; //in bytes
$url = 'http://us7.api.mailchimp.com/export/1.0/list?apikey='.$apikey.'&id='.$list_id.'&output=json';
/** a more robust client can be built using fsockopen **/
$handle = #fopen($url,'r');
if (!$handle) {
echo "failed to access url\n";
} else {
$i = 0;
$header = array();
while (!feof($handle)) {
$buffer = fgets($handle, $chunk_size);
if (trim($buffer)!=''){
$obj = json_decode($buffer);
if ($i==0){
//store the header row
$header = $obj;
} else {
//echo, write to a file, queue a job, etc.
echo $obj[0]." ".$obj[2]."<br>";
}
$i++;
}
}
fclose($handle);
}
?>
Thank you!
It appears to already be in JSON, because you are using json_decode to get that output. So just... stop using json_decode on it.
As #jessica has already mentioned, it appears that $buffer is coming to you as JSON, because you are running json_decode(&buffer) on it.
However, if you want to do some manipulations, arrange the data so you build an array like this:
$myArray = array(
'emails' => array(
array('email' => 'one#gmail.com','option' => 'test1'),
array('email' => 'two#gmail.com','option' => 'test2'),
)
);
Then:
echo json_encode(myArray);
Using your supllied code, it would be something like this (untested):
$myArray = array();
$i = 0;
$header = array();
while (!feof($handle)) {
$buffer = fgets($handle, $chunk_size);
if (trim($buffer)!='') {
$obj = json_decode($buffer);
$myArray['emails'][] = array('email' => $obj[0],'option' => $obj[2]);
$i++;
}
}
fclose($handle);
echo json_encode($myArray);
$i = 0;
$result = array(); //create a new array
$header = array();
while (!feof($handle)) {
$buffer = fgets($handle, $chunk_size);
if (trim($buffer)!=''){
$obj = json_decode($buffer);
//echo $obj[0]." ".$obj[2]."<br>"; //comment out this line
$result[] = array('email' => $obj[0], 'option' => $obj[2]); //push new obj to the array
$i++;
}
}
fclose($handle);
echo json_encode(array('emails' => $result)); // convert to json format

Splitting and combining files

I have sort of a "beginning" of a solution.
I wrote this function (Sorry about the spacings):
<?php
set_time_limit(0);
// Just to get the remote filesize
function checkFilesize($url, $user = "", $pw = ""){
ob_start();
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
if(!empty($user) && !empty($pw)){
$headers = array('Authorization: Basic ' . base64_encode("$user:$pw"));
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
}
$ok = curl_exec($ch);
curl_close($ch);
$head = ob_get_contents();
ob_end_clean();
$regex = '/Content-Length:\s([0-9].+?)\s/';
$count = preg_match($regex, $head, $matches);
return isset($matches[1]) ? $matches[1] : "unknown";
}
// Split filesize to threads
function fileCutter($filesize,$threads){
$calc = round($filesize / count($threads));
$count = 0;
foreach($threads as $thread){
$rounds[$count] = $calc;
$count++;
}
$count = 0;
foreach($rounds as $round){
$set = $count + 1;
if($count == 0){
$from = 0;
} else {
$from = ($round * $count);
}
$cal = ($round * $set);
$final[$count] = array('from'=>$from,'to'=>$cal);
$count++;
}
// Correct the "Rounded" result
$end = end($final);
$differance = $filesize - $end['to'];
if (strpos($differance,'-') !== false) {} else {$add = '+';}
$end_result = ($end['to'].$add.$differance);
$value=eval("return ($end_result);");
$end_id = end(array_keys($final));
$final[$end_id]['to'] = $value;
// Return the complete array with the corrected result
return $final;
}
$threads = array(
0=>'test',
1=>'test',
2=>'test',
3=>'test',
4=>'test',
5=>'test',
);
$file = 'http://www.example.com/file.zip';
$filesize = checkFilesize($file);
$cuts = fileCutter($filesize,$threads);
print_r($cuts);
?>
(Again, Sorry. :) )
It gives "directions" to split the file in specific bytes.
I've tried to do something like so:
foreach($cuts as $cut){
$start = $cut['from'];
$finish = $cut['to'];
$f = fopen($file, "rb");
fseek($f, $start, SEEK_SET);
while(!(ftell($f) > $finish)){
$data = fgetc($f);
}
fclose($f);
But it's going to a endless loop.
What is the problem? or, is there another solution in PHP to split and combine files?
Instead of reading the file manually and byte-wise you could just use file_get_contents() with the according parameters $offset and $maxlen:
// $incp $ctx $offset $maxlen
$data = file_get_contents($fn, FALSE, NULL, $start, $finish-$start);
That'll do the seeking and cutting for you.

Categories