Pure PHP Solution: PDF to plain text without exec()/system() - php

I'm trying to parse PDF files into plain text (strings) with pure PHP, because I've no access to exec or system or other function denied by the server I'm working on.
Those PDF files can't be parsed by the functions I found online.
This is what I get from an echo file_get_contents("file.pdf");
%PDF-1.4 5 0 obj << /Type /XObject /Subtype /Image /Filter /DCTDecode /Length 6536 /Width 200 /Height 125 /BitsPerComponent 8 /ColorSpace /DeviceRGB >> stream ÿØÿàJFIFÿÛC %# , #&')*)-0-(0%()(ÿÛC ((((
and then all the content.
So this is a PDF 1.4 5 0.
Here you are the function I was using for PDF 1.2-1.3 (not working with those files):
function decomprimiPDF($pdfdata) {
if (strlen ($pdfdata) < 1000 && file_exists ($pdfdata))
$pdfdata = file_get_contents ($pdfdata);
$result = '';
if (preg_match_all ('/<<[^>]*FlateDecode[^>]*>>\s*stream(.+)endstream/Uis', $pdfdata, $m))
foreach ($m[1] as $chunk) {
$chunk = gzuncompress(ltrim ($chunk));
$a = preg_match_all ('/\[([^\]]+)\]/', $chunk, $m2) ? $m2[1] : array ($chunk);
foreach ($a as $subchunk) {
if (preg_match_all ('/\(([^\)]+)\)/', $subchunk, $m3)) {
$result .= (join ('', $m3[1]) . '*');
}
}
}
Anyone here can help me with a function in PHP (I repeat it, I tried almost any function already online, and also a few classes, but they don't work with the PDF files I'm talking about).
Thanks for your support ;)

Related

How to extract certificates from app attestation object using php?

I tried to set up app attestation between my app and php but I rarely find any other source of explaination than Apple's own documentation, which let me stuck quite at an early state. So far I got the following steps:
On the client side, following https://developer.apple.com/documentation/devicecheck/establishing_your_app_s_integrity, I creted my attestation as a base64 encoded string:
attestation.base64EncodedString()
I then send that string to the server, following https://developer.apple.com/documentation/devicecheck/validating_apps_that_connect_to_your_server from now on.
The documentation says, that the attestation is in the CBOR format. I therefor first decode the base64 encoded string and parse it using (https://github.com/Spomky-Labs/cbor-php).
<?php
use CBOR\Decoder;
use CBOR\OtherObject;
use CBOR\Tag;
use CBOR\StringStream;
$otherObjectManager = new OtherObject\OtherObjectManager();
$tagManager = new Tag\TagObjectManager();
$decoder = new Decoder($tagManager, $otherObjectManager);
$data = base64_decode(/* .. base64 encoded attestation string as send from the client (see swift snippet above) */);
$stream = new StringStream($data);
$object = $decoder->decode($stream);
$norm = $object->getNormalizedData();
$fmt = $norm['fmt'];
$x5c = $norm['attStmt']['x5c'];
From the documentation, the normalized object should have the following format:
{
fmt: 'apple-appattest',
attStmt: {
x5c: [
<Buffer 30 82 02 cc ... >,
<Buffer 30 82 02 36 ... >
],
receipt: <Buffer 30 80 06 09 ... >
},
authData: <Buffer 21 c9 9e 00 ... >
}
which it does:
$fmt == "apple-appattest" // true
Then the next according to the documentation is described as:
Verify that the x5c array contains the intermediate and leaf certificates for App Attest, starting from the credential certificate in the first data buffer in the array (credcert). Verify the validity of the certificates using Apple’s App Attest root certificate.
However, I don't know how to proceed further on this. The content of e.g. $norm['attStmt']['x5c'][0] is a mix of readable chars and glyphs. To give you an idea, this is a random substring from the content of $norm['attStmt']['x5c'][0]: "Certification Authority10U Apple Inc.10 UUS0Y0*�H�=*�H�=B��c�}�". That's why I'm not really sure wheather I have to perform any further encodeing/decoding steps.
I tried parsing the certificate but without any luck (both var_dump return false):
$cert = openssl_x509_read($x5c[0]);
var_dump($cert); // false - indicating that reading the cert failed
$parsedCert = openssl_x509_parse($cert, false);
var_dump($parsedCert); // false - of course, since the prior step did not succeed
Any ideas, guidance or alternative ressources are highly appreciated. Thank you!
After a while I came up with the following solution. The $x5c field contains a list of certificates, all in binary form. I wrote the folowing converter to create a ready-to-use certificate in PEM format, which does the following:
base64 encode the binary data
break lines after 64 bytes
add BEGIN and END markers (also note the trailing line-break on the end certificate line)
function makeCert($bindata) {
$beginpem = "-----BEGIN CERTIFICATE-----\n";
$endpem = "-----END CERTIFICATE-----\n";
$pem = $beginpem;
$cbenc = base64_encode($bindata);
for($i = 0; $i < strlen($cbenc); $i++) {
$pem .= $cbenc[$i];
if (($i + 1) % 64 == 0)
$pem .= "\n";
}
$pem .= "\n".$endpem;
return $pem;
}
the following then works:
openssl_x509_read(makeCert($x5c[0]))

unpack binary file in PHP

I'm trying to parse a Binary File in PHP which is an attachment of a Document in a NoSQL DB. However, in my tests, if the size of a file is of 1MB, the unpacking lasts for around 12-15 seconds. The file contains information about speed from a sensor.
The binary file converted to hexadecimal is structured as follow:
BB22 1100 0015 XXXX ...
BB22 1300 0400 20FB 5900 25FB 5910 ... 20FB 5910
BB22 1100 0015 ...
BB22 1300 0400 20FB 5700 25FB 5810 ... 20FB 5912
BB22 1300 0400 20FB 5700 25FB 5810 ... 20FB 5912
...
The marker BB22 1100 contains the sensor specification, while 0015 refers to the size of that information.
The marker BB22 1300 contains other data plus the actual speed from the sensor. The next two bytes 0400 represent the length of that chunk, which is of 1024 bytes.
I'm only interested in the speed which are the values e.g. 5900 5910 5910 5700 5810 ...
My approach is as follow:
$file = fopen($url, 'r', false, authenticationContext($url));
$result = stream_get_contents($file, -1);
fclose($file);
$hex_result = bin2hex($result);
$markerData = 'bb2213';
$sensorDataUnpack= "sspeed"; // signed int16
while(($pos = strpos($hex_result, $markerData, $pos)) !== FALSE){
$pos=$pos+4;
for ($j=4; $j<1028; $j=$j+4) {
$d = unpack($sensorDataUnpack, substr($result, $pos/2+$j+2));
$sensorData[] = $d;
}
}
I converted the results from binary to hexadecimal because it wasn't working for me to get the positions properly. Anyway, I believe this code can be very much improved, any ideas?.
This should be fast, but without test data I wasn't able to test it.
The key points are these:
Open the URL as binary, and use the fread() to help in positioning and in slicing up the data to parts.
Use the unpack both for parsing the headers and the bodies of the entries as well.
Use the asterisk * repeater to quickly parse the big bodies for signed shorts.
Use array_values() to convert the associative array to a simple array with numeric keys (like: 0, 1, 2, ...).
Update: I solved the endianness and bitness problem around the marker comparison by using the "H4" pack format to get a hexa string in big endian order.
$sensorData = array();
$file = fopen($url, 'rb', false, authenticationContext($url));
while (($header = fread($file, 6)) !== false) {
$fields = unpack("H4marker/ssize", $header);
$body = fread($file, $fields["size"] * 2);
if ($body === false) {
throw new Exception("import: data stream unexpectedly ended.");
}
if ($fields["marker"] == "BB221300") {
$data = array_values(unpack("s*", $body));
// Store only every second value.
for ($i = 1; $i < count($data); $i+=2) {
$sensorData[] = $data[$i];
}
}
}
fclose($file);

Convert C++ code to PHP?

I am trying to write a struct in php, i know there is no such thing in php, but at least get it working somehow...
C++:
// The struct
typedef struct data
{
char numbers[20];
char numbers2[50];
char number3[6];
char sometext[100];
}data_t;
data_t config;
char numbers[20] = "12345.12345";
char numbers3[6] = "12345";
char sometext[100] = "asdsadsad";
// Storing into struct
strcpy_s(config.numbers, numbers);
strcpy_s(config.numbers3, numbers3);
strcpy_s(config.sometext, sometext);
// Serializing struct to test.dat
ofstream output_file("test.dat", ios::binary);
output_file.write((char*)&config, sizeof(config));
output_file.close();
// Reading from it
ifstream input_file("test.dat", ios::binary);
input_file.read((char*)&master, sizeof(master));
cout << "NUMBERS : " << master.numbers << endl;
cout << "NUMBERS3 : " << master.numbers3 << endl;
cout << "SOMETEXT : " << master.sometext << endl;
cout << endl << endl;
Now storing with c++ in the struct, then reading it works just fine, but i want to store in that file trough php, then read it from c++, so i have:
PHP:
$data = Array();
$data['numbers'] = "12345.12345";
$data['numbers3'] = "12345";
$data['sometext'] = "abcdfghs";
$fp=fopen("test.dat","wb") or die("Stop! i kill you...");
foreach($data as $key => $value){
echo 'written:'.$value;
fwrite($fp,$value."\t");
}
Now what is happening is:
NUMBERS : 12345.12345 12345 abcdfghs
NUMBERS3 :
SOMETEXT :
So as you can see, it`s not good, also i noticed a difference when writing to file from c++ (contains binary data), while writing to file from php is just plain text.
Some help would be apreciated, many thanks!
Your C++ struct allocates 20 bytes for the numbers member. That means when you write it to the file, all 20 bytes are written, the writing doesn't just stop after writing 12345.12345. Your PHP code, on the other hand, writes exactly what is in $data['numbers'] and stops immediately (well, after adding a useless "\t"). The "binary data" you noticed in the file is just the garbage which happened to be in memory in those leftover bytes after 12345.12345. Same goes for the other fields.
Your PHP code does not write the string's terminating NULL to the file.
Your PHP code does not write the numbers2 member to the file.
You need to ensure the PHP code writes the terminating NULL, pads the output to the same size as the field has in the C++ struct, and outputs the fields in the same order as the C++ struct. You can use pack() for this:
<?php
$data = array();
$data['numbers'] = "12345.12345";
$data['numbers2'] = '';
$data['numbers3'] = "12345";
$data['sometext'] = "abcdfghs";
$packed = pack('a20a50a6a100', $data['numbers'], $data['numbers2'], $data['numbers3'], $data['sometext']);
$written = file_put_contents("test.dat", $packed);
if($written === false) {
throw new RuntimeException("Failed to write data to file!");
} else if($written !== strlen($packed)) {
throw new RuntimeException("Writing to file was not complete!");
}
Note: For maximum compatibility, you should read/write each struct member to the file individually in a consistent order on both sides. Otherwise you can have problems due to C++ field padding/alignment.

Correct way to recreate HTML document from a string?

First things first, I'm cheap! :) I can't afford to buy a static IP for my domain and I can't afford those fancy certificates... So no SSL/HTTPS for me.
What I'm trying to accomplish here is to roll-out my own "HTTP encryption". Here's what I have accomplished so far:
Modified an existing proxy script (Glype/PHProxy) to "encrypt" (base64 for now) the echo output. (I'm wrapping the entire content in a body element, btw)
Written a GreaseMonkey script to "decrypt" the encrypted output.
The thing works on simple websites. But when I'm loading complex websites (like a browser game), the javascripts are broken (btw, the script can render the game perfectly when I turned off my encryption).
Upon inspection via FireBug, I've noticed that the contents of the head element is being placed in the body element. This doesn't always happen so I suspected that the PHP is throwing malformed output, but I decoded the base64 using an offline tool and the HTML looks okay.
Here's a sample output from the PHP:
<html><body>PGh0bWw+DQo8aGVhZD4NCjx0aXRsZT5IZWxsbzwvdGl0bGU+DQo8L2hlYWQ+DQo8Ym9keT4NCjxoMT5IZWxsbyBXb3JsZDwvaDE+DQo8L2JvZHk+DQo8L2h0bWw+</body></html>
Here's the decoded HTML from Firebug (after being processed by the GM script):
<html>
<head>
<title>Hello</title>
</head>
<body>
<h1>Hello World</h1>
</body>
</html>
Here's my GM script to decode the PHP output:
function utf8_decode (str_data) {
var tmp_arr = [],
i = 0,
ac = 0,
c1 = 0,
c2 = 0,
c3 = 0;
str_data += '';
while (i < str_data.length) {
c1 = str_data.charCodeAt(i);
if (c1 < 128) {
tmp_arr[ac++] = String.fromCharCode(c1);
i++;
} else if (c1 > 191 && c1 < 224) {
c2 = str_data.charCodeAt(i + 1);
tmp_arr[ac++] = String.fromCharCode(((c1 & 31) << 6) | (c2 & 63));
i += 2;
} else {
c2 = str_data.charCodeAt(i + 1);
c3 = str_data.charCodeAt(i + 2);
tmp_arr[ac++] = String.fromCharCode(((c1 & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
i += 3;
}
}
return tmp_arr.join('');
}
function base64_decode (data) {
var b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
var o1, o2, o3, h1, h2, h3, h4, bits, i = 0,
ac = 0,
dec = "",
tmp_arr = [];
if (!data) {
return data;
}
data += '';
do { // unpack four hexets into three octets using index points in b64
h1 = b64.indexOf(data.charAt(i++));
h2 = b64.indexOf(data.charAt(i++));
h3 = b64.indexOf(data.charAt(i++));
h4 = b64.indexOf(data.charAt(i++));
bits = h1 << 18 | h2 << 12 | h3 << 6 | h4;
o1 = bits >> 16 & 0xff;
o2 = bits >> 8 & 0xff;
o3 = bits & 0xff;
if (h3 == 64) {
tmp_arr[ac++] = String.fromCharCode(o1);
} else if (h4 == 64) {
tmp_arr[ac++] = String.fromCharCode(o1, o2);
} else {
tmp_arr[ac++] = String.fromCharCode(o1, o2, o3);
}
} while (i < data.length);
dec = tmp_arr.join('');
dec = utf8_decode(dec);
return dec;
}
document.documentElement.innerHTML = base64_decode(document.body.innerHTML);
I think the problem is I'm assigning the decoded HTML to document.documentElement.innerHTML, and by doing so it's putting the entire thing inside the body element?
So the question is, what is the correct way to recreate a HTML document from a string?
Since you are just base 64 encoding, and as #Battle_707 has said the issue is with dom events, why don't you send a page that redirects to a data url. This way the browser should fire all the right events.
But seriously, just get a certificate and get on dyndns.com, base 64 buys you no extra security
Edit
Since you mentioned moving to AES, if you can find a JS AES implementation you could use my suggestion here and construct the data URL client side and redirect to that.
function openPageFromString(html){
location="data:text/html,"+encodeURIComponent(html);
}
The problem with, what you refer to as 'complex' pages, is that they have very specific DOM events. These events will be triggered either when the browser reads the line for the first time, or upon certain 'breakpoints' (like 'onload'). Since you obfuscate the code, and then decode it after it has been fully downloaded, your browser won't re-read the page to hit those events. Maybe, just maybe, you could call every function from those events manually after the page has been loaded, but I would not be surprised if (some) browsers will give you a hard time doing that, since the page has been created like <html><head></head><body><html>.....your decoded page....</html></body></html>. This is besides the fact that JS engines might not even index the new code at all.

PHP: How to get version from android .apk file?

I am trying to create a PHP script to get the app version from Android APK file.
Extracting XML file from the APK (zip) file and then parsing XML is one way, but I guess it should be simpler. Something like PHP Manual, example #3.
Any ideas how to create the script?
If you have the Android SDK installed on the server, you can use PHP's exec (or similar) to execute the aapt tool (in $ANDROID_HOME/platforms/android-X/tools).
$ aapt dump badging myapp.apk
And the output should include:
package: name='com.example.myapp' versionCode='1530' versionName='1.5.3'
If you can't install the Android SDK, for whatever reason, then you will need to parse Android's binary XML format. The AndroidManifest.xml file inside the APK zip structure is not plain text.
You would need to port a utility like AXMLParser from Java to PHP.
I've created a set of PHP functions that will find just the Version Code of an APK. This is based on the fact that the AndroidMainfest.xml file contains the version code as the first tag, and based on the axml (binary Android XML format) as described here
<?php
$APKLocation = "PATH TO APK GOES HERE";
$versionCode = getVersionCodeFromAPK($APKLocation);
echo $versionCode;
//Based on the fact that the Version Code is the first tag in the AndroidManifest.xml file, this will return its value
//PHP implementation based on the AXML format described here: https://stackoverflow.com/questions/2097813/how-to-parse-the-androidmanifest-xml-file-inside-an-apk-package/14814245#14814245
function getVersionCodeFromAPK($APKLocation) {
$versionCode = "N/A";
//AXML LEW 32-bit word (hex) for a start tag
$XMLStartTag = "00100102";
//APK is esentially a zip file, so open it
$zip = zip_open($APKLocation);
if ($zip) {
while ($zip_entry = zip_read($zip)) {
//Look for the AndroidManifest.xml file in the APK root directory
if (zip_entry_name($zip_entry) == "AndroidManifest.xml") {
//Get the contents of the file in hex format
$axml = getHex($zip, $zip_entry);
//Convert AXML hex file into an array of 32-bit words
$axmlArr = convert2wordArray($axml);
//Convert AXML 32-bit word array into Little Endian format 32-bit word array
$axmlArr = convert2LEWwordArray($axmlArr);
//Get first AXML open tag word index
$firstStartTagword = findWord($axmlArr, $XMLStartTag);
//The version code is 13 words after the first open tag word
$versionCode = intval($axmlArr[$firstStartTagword + 13], 16);
break;
}
}
}
zip_close($zip);
return $versionCode;
}
//Get the contents of the file in hex format
function getHex($zip, $zip_entry) {
if (zip_entry_open($zip, $zip_entry, 'r')) {
$buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
$hex = unpack("H*", $buf);
return current($hex);
}
}
//Given a hex byte stream, return an array of words
function convert2wordArray($hex) {
$wordArr = array();
$numwords = strlen($hex)/8;
for ($i = 0; $i < $numwords; $i++)
$wordArr[] = substr($hex, $i * 8, 8);
return $wordArr;
}
//Given an array of words, convert them to Little Endian format (LSB first)
function convert2LEWwordArray($wordArr) {
$LEWArr = array();
foreach($wordArr as $word) {
$LEWword = "";
for ($i = 0; $i < strlen($word)/2; $i++)
$LEWword .= substr($word, (strlen($word) - ($i*2) - 2), 2);
$LEWArr[] = $LEWword;
}
return $LEWArr;
}
//Find a word in the word array and return its index value
function findWord($wordArr, $wordToFind) {
$currentword = 0;
foreach ($wordArr as $word) {
if ($word == $wordToFind)
return $currentword;
else
$currentword++;
}
}
?>
Use this in the CLI:
apktool if 1.apk
aapt dump badging 1.apk
You can use these commands in PHP using exec or shell_exec.
aapt dump badging ./apkfile.apk | grep sdkVersion -i
You will get a human readable form.
sdkVersion:'14'
targetSdkVersion:'14'
Just look for aapt in your system if you have Android SDK installed.
Mine is in:
<SDKPATH>/build-tools/19.0.3/aapt
The dump format is a little odd and not the easiest to work with. Just to expand on some of the other answers, this is a shell script that I am using to parse out name and version from APK files.
aapt d badging PACKAGE | gawk $'match($0, /^application-label:\'([^\']*)\'/, a) { n = a[1] }
match($0, /versionName=\'([^\']*)\'/, b) { v=b[1] }
END { if ( length(n)>0 && length(v)>0 ) { print n, v } }'
If you just want the version then obviously it can be much simpler.
aapt d badging PACKAGE | gawk $'match($0, /versionName=\'([^\']*)\'/, v) { print v[1] }'
Here are variations suitable for both gawk and mawk (a little less durable in case the dump format changes but should be fine):
aapt d badging PACKAGE | mawk -F\' '$1 ~ /^application-label:$/ { n=$2 }
$5 ~ /^ versionName=$/ { v=$6 }
END{ if ( length(n)>0 && length(v)>0 ) { print n, v } }'
aapt d badging PACKAGE | mawk -F\' '$5 ~ /^ versionName=$/ { print $6 }'

Categories