I'm trying to extract images from an Access database with php.
I can read the data, it gives me an hexadecimal string with a "\0" after every 254 chars so after a
$pic = str_replace("\0", '', $pic);
$pic = hex2bin($pic);
I get this:
00000000: 151c 1c00 0300 0000 0700 0100 1400 1b00 ................
00000010: ffff ffff 496d 6167 656d 0000 0105 0000 ....Imagem......
00000020: 0300 0000 0400 0000 4449 4200 5a17 0000 ........DIB.Z...
00000030: f5e8 ffff ac13 0300 2800 0000 e200 0000 ........(.......
00000040: df00 0000 0100 2000 0300 0000 7813 0300 ...... .....x...
00000050: c40e 0000 c40e 0000 0000 0000 0000 0000 ................
00000060: 0000 ff00 00ff 0000 ff00 0000 c2d6 dbff ................
00000070: c9e0 e2ff c7db e0ff c3da dcff cbe2 e4ff ................
00000080: c3dd ddff bbd5 d5ff bedb d8ff c2e2 ddff ................
00000090: c0e2 dbff bee0 d9ff b7dc d2ff b1d6 ccff ................
000000a0: a9ce c4ff a1c6 bcff 9dc3 b7ff 88af a0ff ................
000000b0: 86ad 9eff 7fa8 99ff 7aa3 94ff 739c 8dff ........z...s...
000000c0: 6e97 88ff 6792 83ff 6590 81ff 5986 76ff n...g...e...Y.v.
000000d0: 5986 76ff 5986 76ff 5885 75ff 5685 75ff Y.v.Y.v.X.u.V.u.
I think this is a bitmap with an OLE header, but I couldn't find what to do next. How can I save these, ideally as jpeg?
Edit: when I save the data to a file, no program I tried can view/identify the image. Also, none of the imagecreatefrom*() worked. I think I need to handle the header somehow.
Related
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 3 years ago.
I have a database encoded as utf8mb4. I connect with this database and I set utf8 charset:
$dbHandler = new PDO("mysql:host=$dbHost;dbname=$dbName;charset=utf8mb4", $dbUsername, $dbPassword);
All data is property encoded in DB. I want to fetch data and save it as CSV:
$fp = fopen('data.csv', 'w+');
foreach ($result as $row) {
...
fputcsv($fp, $csvData, ';');
}
But then all the encoding is broken:
groÃ<9f>e,
Zubehör. etc.
I've tried to add BOM (didn't help) and convert array_map("utf8_encode", $csvData); (some characters are displaying correct: große, Zubehör, but some not: Kabelverl?ng, F?r). Any idea?
EDIT:
Hexdump output beginning of file:
00000000: efbb bf70 726f 6475 6374 3b61 7274 6963 ...product;artic
00000010: 6c65 3b73 6b75 3b64 6174 653b 6e61 6d65 le;sku;date;name
00000020: 0a30 3030 3239 3039 3530 3030 3b3b 3b3b .00028151000;;;;
00000030: 2242 7265 616b 6f75 742d 626f 7820 4b70 "Breakout-box Kp
00000040: 6c2e 223b 223c 7374 726f 6e67 3e42 7265 l.";"<strong>Bre
00000050: 616b 6f75 742d 626f 7820 4b70 6c2e 3c2f akout-box Kpl.</
Hexdump output of file with 1 record where we can see the issue (F..r instead of Für). By the way - original string was modified by ucwords and strtolower:
00000000: 3030 3032 3930 3936 3030 333b 3b3b 3b22 00028151000;;;;"
00000010: 4e65 747a 7465 696c 2032 3230 762f 3132 Netzteil 220v/12
00000020: 7620 46e3 9c72 2041 766c 223b 223c 7374 v F..r Avl";"<st
00000030: 726f 6e67 3e4e 6574 7a74 6569 6c20 3232 rong>Netzteil 22
00000040: 3076 2f31 3276 2046 e39c 7220 4176 6c3c 0v/12v F..r Avl<
00000050: 2f73 7472 6f6e 673e 3c62 723e 3c62 723e /strong><br><br>
00000060: 4f45 4d20 4e75 6d6d 6572 3a20 3030 3032 OEM Nummer: 0002
00000070: 3930 3936 3030 3322 3b31 3038 2e34 363b 9096003";108.46;
00000080: 3030 3032 3930 3936 3030 332d 6e65 747a 00028151000-netz
00000090: 7465 696c 2d32 3230 762d 3132 762d 6675 teil-220v-12v-fu
000000a0: 722d 6176 6c3b 4875 7371 7661 726e 613b r-avl;Husqvarna;
000000b0: 4452 4f50 444f 574e 3b59 3b4e 3b68 7474 DROPDOWN;Y;N;htt
000000c0: 7073 3a2f 2f73 7061 7265 7061 7274 7366 ps://sparepartsf
000000d0: 696e 6465 722e 6b74 6d2e 636f 6d2f 5350 inder.fha.com/SP
000000e0: 462f 496d 6167 6573 2f6d 6170 732f 3130 F/Images/maps/10
000000f0: 3030 3032 3932 302e 6769 663b 313b 4154 0002920.gif;1;AT
00000100: 3b57 6964 6765 743b 224b 544d 204f 7269 ;Ponret;"KTM Ori
00000110: 6769 6e61 6c20 4572 7361 747a 7465 696c ginal Ersatzteil
00000120: 6522 3b22 4875 7371 7661 726e 6120 4e65 e";"Husqvarna Ne
00000130: 747a 7465 696c 2032 3230 762f 3132 7620 tzteil 220v/12v
00000140: 46e3 9c72 2041 766c 202d 204f 454d 204e F..r Avl - OEM N
00000150: 756d 6d65 723a 2030 3030 3239 3039 3630 ummer: 000290960
00000160: 3033 223b 3b22 4b61 7566 656e 2053 6965 03";;"Kaufen Sie
00000170: 2048 7573 7176 6172 6e61 204e 6574 7a74 Husqvarna Netzt
00000180: 6569 6c20 3232 3076 2f31 3276 2046 e39c eil 220v/12v F..
00000190: 7220 4176 6c20 6d69 7420 4f45 4d2d 4e75 r Avl mit OEM-Nu
000001a0: 6d6d 6572 2030 3030 3239 3039 3630 3033 mmer 00028151000
000001b0: 2062 6569 2065 696e 656d 2048 7573 7176 bei einem Husqv
000001c0: 6172 6e61 2d56 6572 7472 6167 7368 c3a4 arna-Vertragsh..
000001d0: 6e64 6c65 722e 2057 6972 2068 6162 656e ndler. Wir haben
000001e0: 2065 696e 6520 6772 6fc3 9f65 2041 7573 eine gro..e Aus
000001f0: 7761 686c 2061 6e20 4875 7371 7661 726e wahl an Husqvarn
00000200: 612d 4572 7361 747a 7465 696c 656e 2c20 a-Ersatzteilen,
00000210: 4163 6365 7373 6f72 6965 732c 2043 6c6f Accessories, Clo
00000220: 7468 696e 672c 204d 5820 4265 6b6c 6569 thing, MX Beklei
00000230: 6475 6e67 2075 6e64 205a 7562 6568 c3b6 dung und Zubeh..
00000240: 722e 220a r.".
file data.csv output:
data.csv: Non-ISO extended-ASCII text, with very long lines
The problem was that I was using strtolower and ucfirst. I changed it to
$name = mb_convert_case($name, MB_CASE_LOWER, "UTF-8");
$name = mb_convert_case($name, MB_CASE_TITLE, "UTF-8");
and it works.
I have a DBF file created as part of a shapefile with rgdal library's writeOGR function (in R).
When I ask to see its first bytes with Linux od command, I get the following.
od -x -c -N 32 BRA.dbf
0000000 7703 1e07 001b 0000 00a1 00d1 0000 0000
0000020 0000 0000 0000 0000 0000 0000 5700 0000
0000040
My PHP code goes like this.
$dbf = fopen('BRA.dbf','rb');
fread($dbf,10); // jumps over the first 10 bytes
$dbfRecSize = unpack('v',fread($dbf,2))[1]; // 'v' = little endian 16 bits: 00d1 = d1(16) = 209
fread($dbf,17); // jumps over a few more bytes
$dbfLangID = ord(fread($dbf,1)); // language driver ID
if ($dbfLangID == 0x57) {
echo "Language: 0x57 (ISO-8859-1)\n";
} else {
echo "Language: $dbfLangID;\n";
}
The code above outputs "Language: 0x57 (ISO-8859-1)", which means the "57" close to the end of the od output is being read with the ord(fread($dbf,1)); command.
Strange thing is that I've read 10+2+17 = 29 bytes from the file, so the next byte should be "00", or not (right after the 0x57)? $dbfRecSize is 209, which means my logic is correct in the first two reads. Why isn't it in the following reads?
What am I misunderstanding here?
The error is that I was confusing od command with debug from DOS...
od -x prints bytes with the order reversed every two bytes (too confusing to me).
0000000 7703 1e07 001b 0000 00a1 00d1 0000 0000
0000020 0000 0000 0000 0000 0000 0000 5700 0000
od -t x1 prints each byte once and separated (harder to count/read in the middle of the line).
0000000 03 77 07 1e 1b 00 00 00 a1 00 d1 00 00 00 00 00
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 57 00 00
Wonder if is there an option to print bytes two by two (in hexadecimal), without reversing their orders?
I have this exec triggered at some if condition :
exec('( [ ! -e "ha45_temp" ] && touch ha45_temp && wget http://localhost/index.php/Montor/makeApiCameraPrintscreen?id='.$insert_id .' -O ./images/'. $insert_id.' > /dev/null 2>&1 && sleep 30 && rm ha45_temp ) > /dev/null 2>&1 &');
I'm calling makeApiCameraPrintscreen method from my main controller, and direct its output to this path : -O ./images/ with the name of the $insert_id(I hold in that var a unique id to name the pictures)
This is the method:
public function makeApiCameraPrintscreen(){
$imagename = $this->input->get('id', TRUE);
$datasidget = $this->curl->simple_get('somepictureapi');
$datasid = json_decode($datasidget,true);
$data_sid = $datasid['data']['sid'];
$reqUrl = "somepictureapi";
$imageencode = $this->curl->simple_get($reqUrl);
$this->output->set_content_type('jpeg');
$this->output->set_output($imageencode);
}
My problem is that my wget saves a binary octet stream file( that binary is the from the image thou, but it's not in proper extension).
�t?���q��H�*�ӏ��m�}"�Vo�k��c(���䢐T�m첟
This is a example of a line. Now thing is I'm not sure that I will get the pictures in jpeg format (content-type), but I try'ed thou with jpg/png same output.
What I want to do is to save the picture from my wget in a proper format so that I can open it.
PS1. for Mark:
00000000: 0aff d8ff fe00 104c 6176 6335 362e 3630 .......Lavc56.60
00000010: 2e31 3030 00ff db00 4300 080a 0a0b 0a0b .100....C.......
00000020: 0d0d 0d0d 0d0d 100f 1010 1010 1010 1010 ................
00000030: 1010 1012 1212 1515 1512 1212 1010 1212 ................
00000040: 1414 1515 1717 1715 1515 1517 1719 1919 ................
00000050: 1e1e 1c1c 2323 242b 2b33 ffc4 01a2 0000 ....##$++3......
00000060: 0105 0101 0101 0101 0000 0000 0000 0000 ................
00000070: 0102 0304 0506 0708 090a 0b01 0003 0101 ................
00000080: 0101 0101 0101 0100 0000 0000 0001 0203 ................
00000090: 0405 0607 0809 0a0b 1000 0201 0303 0204 ................
I am trying to convert little endian hex to big endian hex.
Example:
Little endian:
E1 31 01 00 00 9D
Big endian:
9D 00 00 01 31 E1
If numbers are in the format described than you can convert by using standard array functions.
function littleToBigEndian($little) {
return implode(' ',array_reverse(explode(' ', $little)));
}
echo littleToBigEndian('E1 31 3C 01 00 00 9B');
// Output: 9B 00 00 01 3C 31 E1
If there are no spaces for separation of numbers you need to str_split() the string instead.
function littleToBigEndian($little) {
return implode('',array_reverse(str_split($little,2)));
}
echo littleToBigEndian('E1313C0100009B');
// Output: 9B0000013C31E1
I saw this question PHP - Get number of pages in a Word document . I also need to determine the pages count from given word file (doc/docx). I tried to investigate phplivedocx/ZF (#hobodave linked to those in the original post answers), but I lost my hands and legs there. I can't use any outer web service either (like DOC2PDF sites, and then count the pages in the PDF version, or so...).
Simply: Is there any php code (using ZF or anything else in PHP, excluding COM object or other execution-files, such 'AbiWord'; I'm using shared Linux server, without exec or similar function), to find the pages count of word file?
EDIT: The word versions that about to be supported are Microsoft-Word 2003 & 2007.
Getting the number of pages for docx files is very easy:
function get_num_pages_docx($filename)
{
$zip = new ZipArchive();
if($zip->open($filename) === true)
{
if(($index = $zip->locateName('docProps/app.xml')) !== false)
{
$data = $zip->getFromIndex($index);
$zip->close();
$xml = new SimpleXMLElement($data);
return $xml->Pages;
}
$zip->close();
}
return false;
}
For 97-2003 format it's certainly challenging, but by no means impossible. The number of pages is stored in the SummaryInformation section of the document, but due to the OLE format of the files that makes it a pain to find. The structure is defined extremely thoroughly (though badly imo) here and simpler here. I looked at this for an hour today, but didn't get very far! (not a level of abstraction I'm used to), but output the hex to better understand the structure:
function get_num_pages_doc($filename)
{
$handle = fopen($filename, 'r');
$line = #fread($handle, filesize($filename));
echo '<div style="font-family: courier new;">';
$hex = bin2hex($line);
$hex_array = str_split($hex, 4);
$i = 0;
$line = 0;
$collection = '';
foreach($hex_array as $key => $string)
{
$collection .= hex_ascii($string);
$i++;
if($i == 1)
{
echo '<b>'.sprintf('%05X', $line).'0:</b> ';
}
echo strtoupper($string).' ';
if($i == 8)
{
echo ' '.$collection.' <br />'."\n";
$collection = '';
$i = 0;
$line += 1;
}
}
echo '</div>';
exit();
}
function hex_ascii($string, $html_safe = true)
{
$return = '';
$conv = array($string);
if(strlen($string) > 2)
{
$conv = str_split($string, 2);
}
foreach($conv as $string)
{
$num = hexdec($string);
$ascii = '.';
if($num > 32)
{
$ascii = unichr($num);
}
if($html_safe AND ($num == 62 OR $num == 60))
{
$return .= htmlentities($ascii);
}
else
{
$return .= $ascii;
}
}
return $return;
}
function unichr($intval)
{
return mb_convert_encoding(pack('n', $intval), 'UTF-8', 'UTF-16BE');
}
which will out put code where you can find the sections such as:
007000: 0500 5300 7500 6D00 6D00 6100 7200 7900 ..S.u.m.m.a.r.y.
007010: 4900 6E00 6600 6F00 7200 6D00 6100 7400 I.n.f.o.r.m.a.t.
007020: 6900 6F00 6E00 0000 0000 0000 0000 0000 i.o.n...........
007030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Which will allow you to see the referencing info such as:
007040: 2800 0201 FFFF FFFF FFFF FFFF FFFF FFFF (...ÿÿÿÿÿÿÿÿÿÿÿÿ
007050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
007060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
007070: 0000 0000 2500 0000 0010 0000 0000 0000 ....%...........
Which will allow you to determine properties described:
_ab = ("SummaryInformation")
_cb = 0028
_mse = 02 (STGTY_STREAM)
_bflags = 01 (DE_BLACK)
_sidLeftSib = FFFF FFFF
_sidRightSib = FFFF FFFF (none)
_sidChild = FFFF FFFF (n/a for STGTY_STREAM)
_clsid = 0000 0000 0000 0000 0000 0000 0000 0000 (n/a)
_dwUserFlags = 0000 0000 (n/a)
_time[0] = CreateTime = 0000 0000 0000 0000 (n/a)
_time[1] = ModifyTime = 0000 0000 0000 0000 (n/a)
_startSect = 0000 0000
_ulSize = 0000 1000
_dptPropType = 0000 (n/a)
Which will let you find the relevant section of code, unpack it and get the page number. Of course this is the hard bit that I just don't have time for, but should set you in the right direction.
M$ don't make it easy!
Have a look at PhpWord from microsoft codeplex ... "http://phpword.codeplex.com/
It will allow you to open and read the word formatted file in PHP and do whatever processing you require.
To get meta data properties of doc,docx,ppt and pptx like number of pages, number of slides using PHP i followed the following process and it worked liked charm and iam so happy, below is the process i followed , hope it helps someone
Download and configure Apache Tika.
once its done you could try executing the following commadn it will give all the meta data about your file
java -jar tika-app-1.5.jar -m test.docx
java -jar tika-app-1.5.jar -m test.doc
java -jar tika-app-1.5.jar -m test.pptx
java -jar tika-app-1.5.jar -m test.ppt
once tested you can execute this comman in PHP script. Thanks.
Excluding using Abiword or OpenOffice? Impossible - number of pages will depend on number of words/letters, fonts used, justification and kerning, margin size, line spacing, paragraph spacing, number of paragraphs, columns, size of graphics / embedded objects, page / column breaks and page margins.
You need something which will can understand all of these.
Even if you use OpenOffice or Abiword, reflowing the text may change the number of pages. Indeed, in some cases opening the same document on a different instance of MSWord may result in a difference.
The best you could probably manage would be a statistical approach based on a representation of the document - but you'll still see huge variance.