How to retrieve digital signature information from PDF with PHP? - php

I have app that needs to retrieve some data (signer name) from digital signature "attached" on PDF files.
I have found only examples in Java and C# using the iText class AcroFields method GetSignatureNames
edit: I've tried pdftk with dump_data_fields and generate_fpdf and the result was that (unfortunately):
/Fields [
<<
/V /dftk.com.lowagie.text.pdf.PdfDictionary#3048918
/T (Signature1)
>>]
and
FieldType: Signature
FieldName: Signature1
FieldFlags: 0
FieldJustification: Left
Thanks in Advance !

Well, it's complicated (I would say even impossible, but who knows) to achieve this only with PHP.
At first, please read article about digital signature in Adobe PDF
Second, after reading this you will know that signature is stored between b and c bytes according to /ByteRange[a b c d] indicator
Third, we can extract b and c from document and then extract signature itself (guide says it will be hexdecoded PKCS7# object).
<?php
$content = file_get_contents('test.pdf');
$regexp = '#ByteRange\[\s*(\d+) (\d+) (\d+)#'; // subexpressions are used to extract b and c
$result = [];
preg_match_all($regexp, $content, $result);
// $result[2][0] and $result[3][0] are b and c
if (isset($result[2]) && isset($result[3]) && isset($result[2][0]) && isset($result[3][0]))
{
$start = $result[2][0];
$end = $result[3][0];
if ($stream = fopen('test.pdf', 'rb')) {
$signature = stream_get_contents($stream, $end - $start - 2, $start + 1); // because we need to exclude < and > from start and end
fclose($stream);
}
file_put_contents('signature.pkcs7', hex2bin($signature));
}
Forth, after third step we have PKCS#7 object in file signature.pkcs7. Unfortunately, I don't know methods to extract information from signature using PHP. So you must be able to run shell commands to use openssl
openssl pkcs7 -in signature.pkcs7 -inform DER -print_certs > info.txt
After running this command in file info.txt you will have a chain of certificates. Last one is the one you need. You can see the structure of the file and parse needed data.
Please also refer to this question, this question and this topic
EDIT at 2017-10-09
I knowingly advised you to see exactly this question
There is a code that you can adjust to your needs.
use ASN1\Type\Constructed\Sequence;
use ASN1\Element;
use X509\Certificate\Certificate;
$seq = Sequence::fromDER($binaryData);
$signed_data = $seq->getTagged(0)->asExplicit()->asSequence();
// ExtendedCertificatesAndCertificates: https://tools.ietf.org/html/rfc2315#section-6.6
$ecac = $signed_data->getTagged(0)->asImplicit(Element::TYPE_SET)->asSet();
// ExtendedCertificateOrCertificate: https://tools.ietf.org/html/rfc2315#section-6.5
$ecoc = $ecac->at($ecac->count() - 1);
$cert = Certificate::fromASN1($ecoc->asSequence());
$commonNameValue = $cert->tbsCertificate()->subject()->toString();
echo $commonNameValue;
I've adjusted it for you, but please make the rest by yourself.

This is my working code in PHP7:
<?php
require_once('vendor/autoload.php');
use Sop\ASN1\Type\Constructed\Sequence;
use Sop\ASN1\Element;
use Sop\X509\Certificate\Certificate;
$currentFile = "./upload/test2.pdf";
$content = file_get_contents($currentFile);
$regexp = '/ByteRange\ \[\s*(\d+) (\d+) (\d+)/'; // subexpressions are used to extract b and c
$result = [];
preg_match_all($regexp, $content, $result);
// $result[2][0] and $result[3][0] are b and c
if (isset($result[2]) && isset($result[3]) && isset($result[2][0]) && isset($result[3][0])) {
$start = $result[2][0];
$end = $result[3][0];
if ($stream = fopen($currentFile, 'rb')) {
$signature = stream_get_contents($stream, $end - $start - 2, $start + 1); // because we need to exclude < and > from start and end
fclose($stream);
}
$binaryData = hex2bin($signature);
$seq = Sequence::fromDER($binaryData);
$signed_data = $seq->getTagged(0)->asExplicit()->asSequence();
// ExtendedCertificatesAndCertificates: https://tools.ietf.org/html/rfc2315#section-6.6
$ecac = $signed_data->getTagged(0)->asImplicit(Element::TYPE_SET)->asSet();
// ExtendedCertificateOrCertificate: https://tools.ietf.org/html/rfc2315#section-6.5
$ecoc = $ecac->at($ecac->count() - 1);
$cert = Certificate::fromASN1($ecoc->asSequence());
$commonNameValue = $cert->tbsCertificate()->subject()->toString();
echo $commonNameValue;
}

I've used iText and found it to be very reliable, I highly recommend it.
you can always call the java code as a "microservice" from PHP.

Related

How to detect .doc password protection

The following answer allows to detect password protected .docx files, after porting it to PHP: https://stackoverflow.com/a/14347730/1794894
$content = utf8_encode(file_get_contents($absolutePath));
if (mb_substr($content, 0, 2) == "ÐÏ") {
# DOC/XLS 2007+
$start = str_replace("\x00", " ", mb_substr($content, 0, 2000));
if (mb_strstr($start, 'E n c r y p t e d P a c k a g e') !== false) {
return true;
}
if ($extension == 'doc') {
return true;
}
}
How to also make a .doc specific check? Do .doc files also have a specific byte sequence? Or is it enough to only rely on the ÐÏ check at the first two characters of the file?
Or is the character at position 0x20B always equal to 0x13 in case of a password protected .doc?
Solved it based on the C# example from the post in the question.
We can not make use of a COM object, most servers do not run MS Word.
See Gist snippet: https://gist.github.com/rvanlaak/06ca1b65658a91240362
It should be something like this...
<?php
$word=new COM("word.application") or die("Cannot create Word object");
$word->Visible=false;
$word->WindowState=2;
$word->DisplayAlerts=false;
$doc = $word->Document->Open("/yourFile.doc");
$passwordProtect = $doc->Document->HasPassword;//true or false
$word->ActiveDocument->Close(false);
$word->Quit();
$word->Release();
$word=null;
?>
I can't test this code, hope it helps...

How to recreate PHAR files with identical sha1sums at different times?

I'm working on a command-line PHP project and want to be able to recreate the PHAR file that is my deployment artifact. The challenge is that I can't create two PHAR's that have identical sha1sums and were created more than 1 second apart from each other. I would like to be able to exactly recreate my PHAR file if the input files are the same (i.e. came from the same git commit).
The following code snippet demonstrates the problem:
#!/usr/bin/php
<?php
$hashes = array();
$file_names = array('file1.phar','file2.phar');
foreach ($file_names as $name) {
if (file_exists($name)) {
unlink($name);
}
$phar = new Phar($name);
$phar->addFromString('cli.php', "cli\n");
$hashes[]=sha1_file($name);
// remove the sleep and the PHAR's are identical.
sleep(1);
}
if ($hashes[0]==$hashes[1]) {
echo "match\n";
} else {
echo "do not match\n";
}
As far as I can tell, the "modification time" field for each file in the PHAR manifest is always set to the current time, and there seems to be no way or overriding that. Even touch("phar://file1.phar/cli.php", 1413387555) gives the error:
touch(): Can not call touch() for a non-standard stream
I ran the above code in PHP 5.5.9 on ubuntu trusty and PHP 5.3 on RHEL5 and both versions behave the same way and fail to create identical PHAR files.
I'm trying to do this in order to follow the advice in the book Continuous Deployment by Jez Humble and David Farley
Any help is appreciated.
The Phar class currently does not allow users to alter or even access the modifiction time. I thought of storing your string into a temporary file and using touch to alter the mtime, but that does not seem to have any effect. So you'll have to manually change the timestamps in the created files and then regenerate the archive signature. Here's how to do it with current PHP versions:
<?php
$filename = "file1.phar";
$archive = file_get_contents($filename);
# Search for the start of the archive header
# See http://php.net/manual/de/phar.fileformat.phar.php
# This isn't the only valid way to write a PHAR archive, but it is what the Phar class
# currently does, so you should be fine (The docs say that the end-of-PHP-tag is optional)
$magic = "__HALT_COMPILER(); ?" . ">";
$end_of_code = strpos($archive, $magic) + strlen($magic);
$data_pos = $end_of_code;
# Skip that header
$data = unpack("Vmanifest_length/Vnumber_of_files/vapi_version/Vglobal_flags/Valias_length", substr($archive, $end_of_code, 18));
$data_pos += 18 + $data["alias_length"];
$metadata = unpack("Vlength", substr($archive, $data_pos, 4));
$data_pos += 4 + $metadata["length"];
for($i=0; $i<$data["number_of_files"]; $i++) {
# Now $data_pos points to the first file
# Files are explained here: http://php.net/manual/de/phar.fileformat.manifestfile.php
$filename_data = unpack("Vfilename_length", substr($archive, $data_pos, 4));
$data_pos += 4 + $filename_data["filename_length"];
$file_data = unpack("Vuncompressed_size/Vtimestamp/Vcompressed_size/VCRC32/Vflags/Vmetadata_length", substr($archive, $data_pos, 24));
# Change the timestamp to zeros (You can also use some other time here using pack("V", time()) instead of the zeros)
$archive = substr($archive, 0, $data_pos + 4) . "\0\0\0\0" . substr($archive, $data_pos + 8);
# Skip to the next file (it's _all_ the headers first, then file data)
$data_pos += 24 + $file_data["metadata_length"];
}
# Regenerate the file's signature
$sig_data = unpack("Vsigflags/C4magic", substr($archive, strlen($archive) - 8));
if($sig_data["magic1"] == ord("G") && $sig_data["magic2"] == ord("B") && $sig_data["magic3"] == ord("M") && $sig_data["magic4"] == ord("B")) {
if($sig_data["sigflags"] == 1) {
# MD5
$sig_pos = strlen($archive) - 8 - 16;
$archive = substr($archive, 0, $sig_pos) . pack("H32", md5(substr($archive, 0, $sig_pos))) . substr($archive, $sig_pos + 16);
}
else {
# SHA1
$sig_pos = strlen($archive) - 8 - 20;
$archive = substr($archive, 0, $sig_pos) . pack("H40", sha1(substr($archive, 0, $sig_pos))) . substr($archive, $sig_pos + 20);
}
# Note: The manual talks about SHA256/SHA512 support, but the according flags aren't documented yet. Currently,
# PHAR uses SHA1 by default, so there's nothing to worry about. You still might have to add those sometime.
}
file_put_contents($filename, $archive);
I've written this ad-hoc for my local PHP 5.5.9 version and your example above. The script will work for files created similar to your example code from above. The documentation hints to some valid deviations from this format. There are comments at the according lines in the code; you might have to add something there if you want to support general Phar files.

How does youtube encode their urls?

Quick Question
How does youtube encode theirs urls? take below
http://www.youtube.com/watch?v=MhWyAL2hKlk
what are they doing to get the value MhWyAL2hKlk
are they using some kind of encryption then decrypting at their end
I want to something similar with a website i am working on below looks horrible.
http://localhost:8888/example/account_player/?playlist=drum+and+bass+music
i would like to encode the urls to act like youtubes dont know how they do it tho.
Any advice
Well, technically speaking, YouTube generates video IDs by using an algorithm. Honestly, I have no idea. It could be a hashsum of the entire video file + a salt using the current UNIX time, or it could be a base64 encoding of something unique to the video. But I do know that it's most likely not random, because if it were, the risk of collision would be too high.
For the sake of example, though, we'll assume that YouTube does generate random ID's. Keep in mind that when using randomly generated values to store something, it is generally a good idea to implement collision checking to ensure that a new object doesn't overwrite the existing one. In practice, though, I would recommend using a hashing algorithm, since they are one-way and very effective at preventing collisions.
So, I'm not very familiar with PHP. I had to write it in JavaScript first. Then, I ported it to PHP, which turned out to be relatively simple:
function randch($charset){
return $charset[rand() % strlen($charset)];
}
function randstr($len, $charset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"){
$out = [];
for($i = 0; $i < $len; $i++){
array_push($out, randch($charset));
}
return join("", $out);
}
What this does is generate a random string len characters long via the given charset.
Here's some sample output:
randstr(5) -> 1EWHd
randstr(30) -> atcUVgfhAmM5bXz-3jgyRoaVnnY2jD
randstr(30, "asdfASDF") -> aFSdSAfsfSdAsSSddFFSSsdasDDaDa
Though it's not a good idea to use such a short charset.
randstr(30, "asdf")
sdadfaafsdsdfsaffsddaaafdddfad
adaaaaaafdfaadsadsdafdsfdfsadd
dfaffafaaddfdddadasaaafsfssssf
randstr(30)
r5BbvJ45HEN6dWtNZc5ZvHGLCg4Qyq
50vKb1rh66WWf9RLZQY2QrMucoNicl
Mklh3zjuRqDOnVYeEY3B0V3Moia9Dn
Now let's say you have told the page to use this function to generate a random id for a video that was just uploaded, now you want to store this key in a table with a link to the relevant data to display the right page. If an id is requested via $_GET (e.g. /watch?v=02R0-1PWdEf), you can tell the page to check this key against the database containing the video ids, and if it finds a match, grab the data from that key, else give a 404.
You can also encode directly to a base 64 string if you don't want it to be random. This can be done with base64_encode() and base64_decode(). For example, say you have the data for the video in one string $str="filename=apples.avi;owner=coolpixlol124", for whatever reason. base64_encode($str) will give you ZmlsZW5hbWU9YXBwbGVzLmF2aTtvd25lcj1jb29scGl4bG9sMTI0.
To decode it later use base64_decode($new_str), which will give back the original string.
Though, as I said before, it's probably a better idea to use a hashing algorithm like SHA.
I hope this helped.
EDIT: I forgot to mention, YouTube's video ids as of now are 11 characters long, so if you want to use the same kind of thing, you would want to use randstr(11) to generate an 11 digit random string, like this sample id I got: 6AMx8N5r6cg
EDIT 2 (2015.12.17): Completely re-wrote answer. Original was crap, I don't know what I was thinking when I wrote it.
Your question is similar to this other SO question which contains some optimised generator functions along with a clear description of the problem you're trying to solve:
php - help improve the efficiency of this youtube style url generator
It will provide you with code, a better understanding of performance issues, and a better understanding of the problem domain all at once.
Dunno how exactly google generates their strings, but the idea is really simple. Create a table like:
+----------+------------------------------+
| code | url |
+----------+------------------------------+
| asdlkasd | playlist=drum+and+bass+music |
+----------+------------------------------+
Now, create your url like:
http://localhost:8888/example/account_player/asdlkasd
After that, just read compare your own made code with the database url and load your image, video or whatever you intend to.
PS: This is just a fast example. It can be done in many other ways also of course.
If you don't want to use decimal numbers, you can encode them into base36:
echo base_convert(123456789, 10, 36); // => "21i3v9"
And decode back:
echo base_convert("21i3v9", 36, 10); // => "123456789"
function alphaID($in, $to_num = false, $pad_up = false, $pass_key = null)
{
$out = '';
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$base = strlen($index);
if ($pass_key !== null) {
for ($n = 0; $n < strlen($index); $n++) {
$i[] = substr($index, $n, 1);
}
$pass_hash = hash('sha256',$pass_key);
$pass_hash = (strlen($pass_hash) < strlen($index) ? hash('sha512', $pass_key) : $pass_hash);
for ($n = 0; $n < strlen($index); $n++) {
$p[] = substr($pass_hash, $n, 1);
}
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
}
if ($to_num) {
// Digital number <<-- alphabet letter code
$len = strlen($in) - 1;
for ($t = $len; $t >= 0; $t--) {
$bcp = bcpow($base, $len - $t);
$out = $out + strpos($index, substr($in, $t, 1)) * $bcp;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
for ($t = ($in != 0 ? floor(log($in, $base)) : 0); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = floor($in / $bcp) % $base;
$out = $out . substr($index, $a, 1);
$in = $in - ($a * $bcp);
}
}
return $out;
}
?>
you can encypt or decrypt using this function.
<?php
$random_id=57256;
$encode=alphaID($random_id);
$decode=alphaID($encode,true); //where boolean true reverse the string back to original
echo "Encode : {$encode} <br> Decode : {$decode}";
?>
Just visit the below for more info :
http://kvz.io/blog/2009/06/10/create-short-ids-with-php-like-youtube-or-tinyurl/
Just use an auto-increment ID value (from a database). Although I personally like the long URLs.

How to calculate CRC of a WinRAR file header?

Talking about this: http://www.win-rar.com/index.php?id=24&kb_article_id=162
I'm able to calculate the correct CRC of an archive header (MAIN_HEAD) by doing:
$crc = crc32(mb_substr($data, $blockOffset + 2, 11, '8bit'));
$crc = dechex($crc);
$crc = substr($crc, -4, 2) . substr($crc, -2, 2);
$crc = hexdec($crc);
The first line will read "CRC of fields HEAD_TYPE to RESERVED2" as states in the documentation. As I noted, it works fine for the archive header.
When I try to calculate the CRC of a file header it always spits out the wrong CRC for unknown reason. I did as the documentation says - "CRC of fields from HEAD_TYPE to FILEATTR" but it simply doesn't work. I've also tried different read-length variations in case the documentation is wrong and it might actually be from HEAD_TYPE to FILE_NAME. Everything without success.
Anyone can give me a hint? I've also checked the unrar source code but it doesn't make me smarter, probably because I don't know C language at all...
I wrote some code that does the same thing. Here is it with some additional snippets for a better understanding:
$this->fh = $fileHandle;
$this->startOffset = ftell($fileHandle); // current location in the file
// reading basic 7 byte header block
$array = unpack('vheaderCrc/CblockType/vflags/vheaderSize', fread($this->fh, 7));
$this->headerCrc = $array['headerCrc'];
$this->blockType = $array['blockType'];
$this->flags = $array['flags'];
$this->hsize = $array['headerSize'];
$this->addSize = 0; // size of data after the header
// -- check CRC of block header --
$offset = ftell($this->fh);
fseek($this->fh, $this->startOffset + 2, SEEK_SET);
$crcData = fread($this->fh, $this->hsize - 2);
// only the 4 lower order bytes are used
$crc = crc32($crcData) & 0xffff;
// igonore blocks with no CRC set (same as twice the blockType)
if ($crc !== $this->headerCrc && $this->headerCrc !== 0x6969 // SRR Header
&& $this->headerCrc !== 0x6a6a // SRR Stored File
&& $this->headerCrc !== 0x7171 // SRR RAR block
&& $this->blockType !== 0x72 // RAR marker block (fixed: magic number)
) {
array_push($warnings, 'Invalid block header CRC found: header is corrupt.');
}
// set offset back to where we started from
fseek($this->fh, $offset, SEEK_SET);
I tested it on a couple of SRR files and it works as expected. I started with reading the basic 7 byte header. The size of the header can be found there. I used this to grab the correct amount of data for the crc32 function. I noticed that when you convert it to hexadecimal, you can get false positives when comparing: '0f00' != 'f00'. You would need to pad it with zeros. This is why I kept the decimal representations of crc32() and unpack() for the comparison. Also, the number of fields of a file block can vary if some header flags are set: it is possible you took a wrong size.

PHP: How to get version from android .apk file?

I am trying to create a PHP script to get the app version from Android APK file.
Extracting XML file from the APK (zip) file and then parsing XML is one way, but I guess it should be simpler. Something like PHP Manual, example #3.
Any ideas how to create the script?
If you have the Android SDK installed on the server, you can use PHP's exec (or similar) to execute the aapt tool (in $ANDROID_HOME/platforms/android-X/tools).
$ aapt dump badging myapp.apk
And the output should include:
package: name='com.example.myapp' versionCode='1530' versionName='1.5.3'
If you can't install the Android SDK, for whatever reason, then you will need to parse Android's binary XML format. The AndroidManifest.xml file inside the APK zip structure is not plain text.
You would need to port a utility like AXMLParser from Java to PHP.
I've created a set of PHP functions that will find just the Version Code of an APK. This is based on the fact that the AndroidMainfest.xml file contains the version code as the first tag, and based on the axml (binary Android XML format) as described here
<?php
$APKLocation = "PATH TO APK GOES HERE";
$versionCode = getVersionCodeFromAPK($APKLocation);
echo $versionCode;
//Based on the fact that the Version Code is the first tag in the AndroidManifest.xml file, this will return its value
//PHP implementation based on the AXML format described here: https://stackoverflow.com/questions/2097813/how-to-parse-the-androidmanifest-xml-file-inside-an-apk-package/14814245#14814245
function getVersionCodeFromAPK($APKLocation) {
$versionCode = "N/A";
//AXML LEW 32-bit word (hex) for a start tag
$XMLStartTag = "00100102";
//APK is esentially a zip file, so open it
$zip = zip_open($APKLocation);
if ($zip) {
while ($zip_entry = zip_read($zip)) {
//Look for the AndroidManifest.xml file in the APK root directory
if (zip_entry_name($zip_entry) == "AndroidManifest.xml") {
//Get the contents of the file in hex format
$axml = getHex($zip, $zip_entry);
//Convert AXML hex file into an array of 32-bit words
$axmlArr = convert2wordArray($axml);
//Convert AXML 32-bit word array into Little Endian format 32-bit word array
$axmlArr = convert2LEWwordArray($axmlArr);
//Get first AXML open tag word index
$firstStartTagword = findWord($axmlArr, $XMLStartTag);
//The version code is 13 words after the first open tag word
$versionCode = intval($axmlArr[$firstStartTagword + 13], 16);
break;
}
}
}
zip_close($zip);
return $versionCode;
}
//Get the contents of the file in hex format
function getHex($zip, $zip_entry) {
if (zip_entry_open($zip, $zip_entry, 'r')) {
$buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
$hex = unpack("H*", $buf);
return current($hex);
}
}
//Given a hex byte stream, return an array of words
function convert2wordArray($hex) {
$wordArr = array();
$numwords = strlen($hex)/8;
for ($i = 0; $i < $numwords; $i++)
$wordArr[] = substr($hex, $i * 8, 8);
return $wordArr;
}
//Given an array of words, convert them to Little Endian format (LSB first)
function convert2LEWwordArray($wordArr) {
$LEWArr = array();
foreach($wordArr as $word) {
$LEWword = "";
for ($i = 0; $i < strlen($word)/2; $i++)
$LEWword .= substr($word, (strlen($word) - ($i*2) - 2), 2);
$LEWArr[] = $LEWword;
}
return $LEWArr;
}
//Find a word in the word array and return its index value
function findWord($wordArr, $wordToFind) {
$currentword = 0;
foreach ($wordArr as $word) {
if ($word == $wordToFind)
return $currentword;
else
$currentword++;
}
}
?>
Use this in the CLI:
apktool if 1.apk
aapt dump badging 1.apk
You can use these commands in PHP using exec or shell_exec.
aapt dump badging ./apkfile.apk | grep sdkVersion -i
You will get a human readable form.
sdkVersion:'14'
targetSdkVersion:'14'
Just look for aapt in your system if you have Android SDK installed.
Mine is in:
<SDKPATH>/build-tools/19.0.3/aapt
The dump format is a little odd and not the easiest to work with. Just to expand on some of the other answers, this is a shell script that I am using to parse out name and version from APK files.
aapt d badging PACKAGE | gawk $'match($0, /^application-label:\'([^\']*)\'/, a) { n = a[1] }
match($0, /versionName=\'([^\']*)\'/, b) { v=b[1] }
END { if ( length(n)>0 && length(v)>0 ) { print n, v } }'
If you just want the version then obviously it can be much simpler.
aapt d badging PACKAGE | gawk $'match($0, /versionName=\'([^\']*)\'/, v) { print v[1] }'
Here are variations suitable for both gawk and mawk (a little less durable in case the dump format changes but should be fine):
aapt d badging PACKAGE | mawk -F\' '$1 ~ /^application-label:$/ { n=$2 }
$5 ~ /^ versionName=$/ { v=$6 }
END{ if ( length(n)>0 && length(v)>0 ) { print n, v } }'
aapt d badging PACKAGE | mawk -F\' '$5 ~ /^ versionName=$/ { print $6 }'

Categories