I am trying to create a PHP script to get the app version from Android APK file.
Extracting XML file from the APK (zip) file and then parsing XML is one way, but I guess it should be simpler. Something like PHP Manual, example #3.
Any ideas how to create the script?
If you have the Android SDK installed on the server, you can use PHP's exec (or similar) to execute the aapt tool (in $ANDROID_HOME/platforms/android-X/tools).
$ aapt dump badging myapp.apk
And the output should include:
package: name='com.example.myapp' versionCode='1530' versionName='1.5.3'
If you can't install the Android SDK, for whatever reason, then you will need to parse Android's binary XML format. The AndroidManifest.xml file inside the APK zip structure is not plain text.
You would need to port a utility like AXMLParser from Java to PHP.
I've created a set of PHP functions that will find just the Version Code of an APK. This is based on the fact that the AndroidMainfest.xml file contains the version code as the first tag, and based on the axml (binary Android XML format) as described here
<?php
$APKLocation = "PATH TO APK GOES HERE";
$versionCode = getVersionCodeFromAPK($APKLocation);
echo $versionCode;
//Based on the fact that the Version Code is the first tag in the AndroidManifest.xml file, this will return its value
//PHP implementation based on the AXML format described here: https://stackoverflow.com/questions/2097813/how-to-parse-the-androidmanifest-xml-file-inside-an-apk-package/14814245#14814245
function getVersionCodeFromAPK($APKLocation) {
$versionCode = "N/A";
//AXML LEW 32-bit word (hex) for a start tag
$XMLStartTag = "00100102";
//APK is esentially a zip file, so open it
$zip = zip_open($APKLocation);
if ($zip) {
while ($zip_entry = zip_read($zip)) {
//Look for the AndroidManifest.xml file in the APK root directory
if (zip_entry_name($zip_entry) == "AndroidManifest.xml") {
//Get the contents of the file in hex format
$axml = getHex($zip, $zip_entry);
//Convert AXML hex file into an array of 32-bit words
$axmlArr = convert2wordArray($axml);
//Convert AXML 32-bit word array into Little Endian format 32-bit word array
$axmlArr = convert2LEWwordArray($axmlArr);
//Get first AXML open tag word index
$firstStartTagword = findWord($axmlArr, $XMLStartTag);
//The version code is 13 words after the first open tag word
$versionCode = intval($axmlArr[$firstStartTagword + 13], 16);
break;
}
}
}
zip_close($zip);
return $versionCode;
}
//Get the contents of the file in hex format
function getHex($zip, $zip_entry) {
if (zip_entry_open($zip, $zip_entry, 'r')) {
$buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
$hex = unpack("H*", $buf);
return current($hex);
}
}
//Given a hex byte stream, return an array of words
function convert2wordArray($hex) {
$wordArr = array();
$numwords = strlen($hex)/8;
for ($i = 0; $i < $numwords; $i++)
$wordArr[] = substr($hex, $i * 8, 8);
return $wordArr;
}
//Given an array of words, convert them to Little Endian format (LSB first)
function convert2LEWwordArray($wordArr) {
$LEWArr = array();
foreach($wordArr as $word) {
$LEWword = "";
for ($i = 0; $i < strlen($word)/2; $i++)
$LEWword .= substr($word, (strlen($word) - ($i*2) - 2), 2);
$LEWArr[] = $LEWword;
}
return $LEWArr;
}
//Find a word in the word array and return its index value
function findWord($wordArr, $wordToFind) {
$currentword = 0;
foreach ($wordArr as $word) {
if ($word == $wordToFind)
return $currentword;
else
$currentword++;
}
}
?>
Use this in the CLI:
apktool if 1.apk
aapt dump badging 1.apk
You can use these commands in PHP using exec or shell_exec.
aapt dump badging ./apkfile.apk | grep sdkVersion -i
You will get a human readable form.
sdkVersion:'14'
targetSdkVersion:'14'
Just look for aapt in your system if you have Android SDK installed.
Mine is in:
<SDKPATH>/build-tools/19.0.3/aapt
The dump format is a little odd and not the easiest to work with. Just to expand on some of the other answers, this is a shell script that I am using to parse out name and version from APK files.
aapt d badging PACKAGE | gawk $'match($0, /^application-label:\'([^\']*)\'/, a) { n = a[1] }
match($0, /versionName=\'([^\']*)\'/, b) { v=b[1] }
END { if ( length(n)>0 && length(v)>0 ) { print n, v } }'
If you just want the version then obviously it can be much simpler.
aapt d badging PACKAGE | gawk $'match($0, /versionName=\'([^\']*)\'/, v) { print v[1] }'
Here are variations suitable for both gawk and mawk (a little less durable in case the dump format changes but should be fine):
aapt d badging PACKAGE | mawk -F\' '$1 ~ /^application-label:$/ { n=$2 }
$5 ~ /^ versionName=$/ { v=$6 }
END{ if ( length(n)>0 && length(v)>0 ) { print n, v } }'
aapt d badging PACKAGE | mawk -F\' '$5 ~ /^ versionName=$/ { print $6 }'
Related
I have app that needs to retrieve some data (signer name) from digital signature "attached" on PDF files.
I have found only examples in Java and C# using the iText class AcroFields method GetSignatureNames
edit: I've tried pdftk with dump_data_fields and generate_fpdf and the result was that (unfortunately):
/Fields [
<<
/V /dftk.com.lowagie.text.pdf.PdfDictionary#3048918
/T (Signature1)
>>]
and
FieldType: Signature
FieldName: Signature1
FieldFlags: 0
FieldJustification: Left
Thanks in Advance !
Well, it's complicated (I would say even impossible, but who knows) to achieve this only with PHP.
At first, please read article about digital signature in Adobe PDF
Second, after reading this you will know that signature is stored between b and c bytes according to /ByteRange[a b c d] indicator
Third, we can extract b and c from document and then extract signature itself (guide says it will be hexdecoded PKCS7# object).
<?php
$content = file_get_contents('test.pdf');
$regexp = '#ByteRange\[\s*(\d+) (\d+) (\d+)#'; // subexpressions are used to extract b and c
$result = [];
preg_match_all($regexp, $content, $result);
// $result[2][0] and $result[3][0] are b and c
if (isset($result[2]) && isset($result[3]) && isset($result[2][0]) && isset($result[3][0]))
{
$start = $result[2][0];
$end = $result[3][0];
if ($stream = fopen('test.pdf', 'rb')) {
$signature = stream_get_contents($stream, $end - $start - 2, $start + 1); // because we need to exclude < and > from start and end
fclose($stream);
}
file_put_contents('signature.pkcs7', hex2bin($signature));
}
Forth, after third step we have PKCS#7 object in file signature.pkcs7. Unfortunately, I don't know methods to extract information from signature using PHP. So you must be able to run shell commands to use openssl
openssl pkcs7 -in signature.pkcs7 -inform DER -print_certs > info.txt
After running this command in file info.txt you will have a chain of certificates. Last one is the one you need. You can see the structure of the file and parse needed data.
Please also refer to this question, this question and this topic
EDIT at 2017-10-09
I knowingly advised you to see exactly this question
There is a code that you can adjust to your needs.
use ASN1\Type\Constructed\Sequence;
use ASN1\Element;
use X509\Certificate\Certificate;
$seq = Sequence::fromDER($binaryData);
$signed_data = $seq->getTagged(0)->asExplicit()->asSequence();
// ExtendedCertificatesAndCertificates: https://tools.ietf.org/html/rfc2315#section-6.6
$ecac = $signed_data->getTagged(0)->asImplicit(Element::TYPE_SET)->asSet();
// ExtendedCertificateOrCertificate: https://tools.ietf.org/html/rfc2315#section-6.5
$ecoc = $ecac->at($ecac->count() - 1);
$cert = Certificate::fromASN1($ecoc->asSequence());
$commonNameValue = $cert->tbsCertificate()->subject()->toString();
echo $commonNameValue;
I've adjusted it for you, but please make the rest by yourself.
This is my working code in PHP7:
<?php
require_once('vendor/autoload.php');
use Sop\ASN1\Type\Constructed\Sequence;
use Sop\ASN1\Element;
use Sop\X509\Certificate\Certificate;
$currentFile = "./upload/test2.pdf";
$content = file_get_contents($currentFile);
$regexp = '/ByteRange\ \[\s*(\d+) (\d+) (\d+)/'; // subexpressions are used to extract b and c
$result = [];
preg_match_all($regexp, $content, $result);
// $result[2][0] and $result[3][0] are b and c
if (isset($result[2]) && isset($result[3]) && isset($result[2][0]) && isset($result[3][0])) {
$start = $result[2][0];
$end = $result[3][0];
if ($stream = fopen($currentFile, 'rb')) {
$signature = stream_get_contents($stream, $end - $start - 2, $start + 1); // because we need to exclude < and > from start and end
fclose($stream);
}
$binaryData = hex2bin($signature);
$seq = Sequence::fromDER($binaryData);
$signed_data = $seq->getTagged(0)->asExplicit()->asSequence();
// ExtendedCertificatesAndCertificates: https://tools.ietf.org/html/rfc2315#section-6.6
$ecac = $signed_data->getTagged(0)->asImplicit(Element::TYPE_SET)->asSet();
// ExtendedCertificateOrCertificate: https://tools.ietf.org/html/rfc2315#section-6.5
$ecoc = $ecac->at($ecac->count() - 1);
$cert = Certificate::fromASN1($ecoc->asSequence());
$commonNameValue = $cert->tbsCertificate()->subject()->toString();
echo $commonNameValue;
}
I've used iText and found it to be very reliable, I highly recommend it.
you can always call the java code as a "microservice" from PHP.
i could only find solution for per line but cant find page break; also confused a lot.
for docx also cant find exact word count.
function read_doc($filename) {
$fileHandle = fopen($filename, "r");
$line = #fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0D), $line);
$outtext = "";
foreach ($lines as $key => $thisline) {
if( $key > 11 ){
var_dump($thisline);
$pos = strpos($thisline, chr(0x00));
if (($pos !== FALSE) || (strlen($thisline) == 0)) {
continue;
} else {
var_dump($thisline);
$text = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t#\/\_\(\)]/", "", $thisline);
var_dump($text);
}
}
}
return $outtext;
}
Implementing your own code for this doesn't sound like a good idea. I would recommend using an external library such as PHPWord. It should allow you to convert the file to plain text. Then, you can extract the word count from it.
Also, an external library such as that adds support for a number of file formats, not restricting you to Word 97-2003.
Here's a basic piece of VB.NET code that counts words per page but be aware it depends on what Word considers to be a word, it is not necessarily what a user considers a word. In my experience you need to properly analyse how Word behaves, what it interprets and then build your logic to ensure that you get the results that you need. It's not PHP but it does the job and can be be a starting point for you.
Structure WordsPerPage
Public pagenum As String
Public count As Long
End Structure
Public Sub CountWordsPerPage(doc As Document)
Dim index As Integer
Dim pagenum As Integer
Dim newItem As WordsPerPage
Dim tmpList As New List(Of WordsPerPage)
Try
For Each wrd As Range In doc.Words
pagenum = wrd.Information(WdInformation.wdActiveEndPageNumber)
Debug.Print("Word {0} is on page {1}", wrd.Text, pagenum)
index = tmpList.FindIndex(Function(value As WordsPerPage)
Return value.pagenum = pagenum
End Function)
If index <> -1 Then
tmpList(index) = New WordsPerPage With {.pagenum = pagenum, .count = tmpList(index).count + 1}
Else
' Unique (or first)
newItem.count = 1
newItem.pagenum = pagenum
tmpList.Add(newItem)
End If
Next
Catch ex As Exception
WorkerErrorLog.AddLog(ex, Err.Number & " " & Err.Description)
Finally
Dim totalWordCount As Long = 0
For Each item In tmpList
totalWordCount = totalWordCount + item.count
Debug.Print("Page {0} has {1} words", item.pagenum, item.count)
Next
Debug.Print("Total word count is {0}", totalWordCount)
End Try
End Sub
When you unzip .doc or .docx file, you will get folder. Look for document.xml file in word subfolder. You will get whole document with xml syntax. Split string by page xml syntax, Strip xml syntax and use str_word_count.
What is figure out that i will need a windows server :-- using COM object ;;
Please check this link
https://github.com/lettertoamit/MS-Word-PER-PAGE-WORDCOUNT/blob/master/index.php
So for those who do not know PHP both 64bit and 32bit builds for windows have a design limitation what means functions like "filesize", "md5_file", "sha1_file" etc. Can not read files over the size of 2GB and the php script shall error or return a invalid/incorrect size for the file.
$fname = $_FILES['Filedata']['tmp_name'];
$filesource = sha1_file($fname);
A soloution with the windows command prompt is as follows.
CertUtil -hashfile "C:\Users\C0n\Desktop\2GB-file.MP4" SHA1
How can i use that in my PHP code in order to recieve the sha1 sum of the large file.
<?php
$result = shell_exec ('CertUtil -hashfile "C:\Users\C0n\Desktop\2GB-file.MP4" SHA1');
var_dump ($result);
My working code is as follows.
//Check OS is Windows
if(substr(PHP_OS, 0, 3) == "WIN") {
//input file
$input = 'CertUtil -hashfile "C:\Users\C0n\Desktop\2GB-file.MP4" SHA1';
//Eexecute input and put the response into a array
exec($input, $response);
//Remove spaces between the hash output.
$str = str_replace(' ', '', $response[1]);
//Display the hash of the file
echo $str;
}
I'm working on a command-line PHP project and want to be able to recreate the PHAR file that is my deployment artifact. The challenge is that I can't create two PHAR's that have identical sha1sums and were created more than 1 second apart from each other. I would like to be able to exactly recreate my PHAR file if the input files are the same (i.e. came from the same git commit).
The following code snippet demonstrates the problem:
#!/usr/bin/php
<?php
$hashes = array();
$file_names = array('file1.phar','file2.phar');
foreach ($file_names as $name) {
if (file_exists($name)) {
unlink($name);
}
$phar = new Phar($name);
$phar->addFromString('cli.php', "cli\n");
$hashes[]=sha1_file($name);
// remove the sleep and the PHAR's are identical.
sleep(1);
}
if ($hashes[0]==$hashes[1]) {
echo "match\n";
} else {
echo "do not match\n";
}
As far as I can tell, the "modification time" field for each file in the PHAR manifest is always set to the current time, and there seems to be no way or overriding that. Even touch("phar://file1.phar/cli.php", 1413387555) gives the error:
touch(): Can not call touch() for a non-standard stream
I ran the above code in PHP 5.5.9 on ubuntu trusty and PHP 5.3 on RHEL5 and both versions behave the same way and fail to create identical PHAR files.
I'm trying to do this in order to follow the advice in the book Continuous Deployment by Jez Humble and David Farley
Any help is appreciated.
The Phar class currently does not allow users to alter or even access the modifiction time. I thought of storing your string into a temporary file and using touch to alter the mtime, but that does not seem to have any effect. So you'll have to manually change the timestamps in the created files and then regenerate the archive signature. Here's how to do it with current PHP versions:
<?php
$filename = "file1.phar";
$archive = file_get_contents($filename);
# Search for the start of the archive header
# See http://php.net/manual/de/phar.fileformat.phar.php
# This isn't the only valid way to write a PHAR archive, but it is what the Phar class
# currently does, so you should be fine (The docs say that the end-of-PHP-tag is optional)
$magic = "__HALT_COMPILER(); ?" . ">";
$end_of_code = strpos($archive, $magic) + strlen($magic);
$data_pos = $end_of_code;
# Skip that header
$data = unpack("Vmanifest_length/Vnumber_of_files/vapi_version/Vglobal_flags/Valias_length", substr($archive, $end_of_code, 18));
$data_pos += 18 + $data["alias_length"];
$metadata = unpack("Vlength", substr($archive, $data_pos, 4));
$data_pos += 4 + $metadata["length"];
for($i=0; $i<$data["number_of_files"]; $i++) {
# Now $data_pos points to the first file
# Files are explained here: http://php.net/manual/de/phar.fileformat.manifestfile.php
$filename_data = unpack("Vfilename_length", substr($archive, $data_pos, 4));
$data_pos += 4 + $filename_data["filename_length"];
$file_data = unpack("Vuncompressed_size/Vtimestamp/Vcompressed_size/VCRC32/Vflags/Vmetadata_length", substr($archive, $data_pos, 24));
# Change the timestamp to zeros (You can also use some other time here using pack("V", time()) instead of the zeros)
$archive = substr($archive, 0, $data_pos + 4) . "\0\0\0\0" . substr($archive, $data_pos + 8);
# Skip to the next file (it's _all_ the headers first, then file data)
$data_pos += 24 + $file_data["metadata_length"];
}
# Regenerate the file's signature
$sig_data = unpack("Vsigflags/C4magic", substr($archive, strlen($archive) - 8));
if($sig_data["magic1"] == ord("G") && $sig_data["magic2"] == ord("B") && $sig_data["magic3"] == ord("M") && $sig_data["magic4"] == ord("B")) {
if($sig_data["sigflags"] == 1) {
# MD5
$sig_pos = strlen($archive) - 8 - 16;
$archive = substr($archive, 0, $sig_pos) . pack("H32", md5(substr($archive, 0, $sig_pos))) . substr($archive, $sig_pos + 16);
}
else {
# SHA1
$sig_pos = strlen($archive) - 8 - 20;
$archive = substr($archive, 0, $sig_pos) . pack("H40", sha1(substr($archive, 0, $sig_pos))) . substr($archive, $sig_pos + 20);
}
# Note: The manual talks about SHA256/SHA512 support, but the according flags aren't documented yet. Currently,
# PHAR uses SHA1 by default, so there's nothing to worry about. You still might have to add those sometime.
}
file_put_contents($filename, $archive);
I've written this ad-hoc for my local PHP 5.5.9 version and your example above. The script will work for files created similar to your example code from above. The documentation hints to some valid deviations from this format. There are comments at the according lines in the code; you might have to add something there if you want to support general Phar files.
Is there a maximum file size the XMLReader can handle?
I'm trying to process an XML feed about 3GB large. There are certainly no PHP errors as the script runs fine and successfully loads to the database after it's been run.
The script also runs fine with smaller test feeds - 1GB and below. However, when processing larger feeds the script stops reading the XML File after about 1GB and continues running the rest of the script.
Has anybody experienced a similar problem? and if so how did you work around it?
Thanks in advance.
I had same kind of problem recently and I thought to share my experience.
It seems that problem is in the way PHP was compiled, whether it was compiled with support for 64bit file sizes/offsets or only with 32bit.
With 32bits you can only address 4GB of data. You can find a bit confusing but good explanation here: http://blog.mayflower.de/archives/131-Handling-large-files-without-PHP.html
I had to split my files with Perl utility xml_split which you can find here: http://search.cpan.org/~mirod/XML-Twig/tools/xml_split/xml_split
I used it to split my huge XML file into manageable chunks. The good thing about the tool is that it splits XML files over whole elements. Unfortunately its not very fast.
I needed to do this one time only and it suited my needs, but I wouldn't recommend it repetitive use. After splitting I used XMLReader on smaller files of about 1GB in size.
Splitting up the file will definitely help. Other things to try...
adjust the memory_limit variable in php.ini. http://php.net/manual/en/ini.core.php
rewrite your parser using SAX -- http://php.net/manual/en/book.xml.php . This is a stream-oriented parser that doesn't need to parse the whole tree. Much more memory-efficient but slightly harder to program.
Depending on your OS, there might also be a 2gb limit on the RAM chunk that you can allocate. Very possible if you're running on a 32-bit OS.
It should be noted that PHP in general has a max file size. PHP does not allow for unsigned integers, or long integers, meaning you're capped at 2^31 (or 2^63 for 64 bit systems) for integers. This is important because PHP uses an integer for the file pointer (your position in the file as you read through), meaning it cannot process a file larger than 2^31 bytes in size.
However, this should be more than 1 gigabyte. I ran into issues with two gigabytes (as expected, since 2^31 is roughly 2 billion).
I've run into a similar issue when parsing large documents. What I wound up doing is breaking the feed into smaller chunks using filesystem functions, then parsing those smaller chunks... So if you have a bunch of <record> tags that you are parsing, parse them out with string functions as a stream, and when you get a full record in the buffer, parse that using the xml functions... It sucks, but it works quite well (and is very memory efficient, since you only have at most 1 record in memory at any one time)...
Do you get any errors with
libxml_use_internal_errors(true);
libxml_clear_errors();
// your parser stuff here....
$r = new XMLReader(...);
// ....
foreach( libxml_get_errors() as $err ) {
printf(". %d %s\n", $err->code, $err->message);
}
when the parser stops prematurely?
Using WindowsXP, NTFS as filesystem and php 5.3.2 there was no problem with this test script
<?php
define('SOURCEPATH', 'd:/test.xml');
if ( 0 ) {
build();
}
else {
echo 'filesize: ', number_format(filesize(SOURCEPATH)), "\n";
timing('read');
}
function timing($fn) {
$start = new DateTime();
echo 'start: ', $start->format('Y-m-d H:i:s'), "\n";
$fn();
$end = new DateTime();
echo 'end: ', $start->format('Y-m-d H:i:s'), "\n";
echo 'diff: ', $end->diff($start)->format('%I:%S'), "\n";
}
function read() {
$cnt = 0;
$r = new XMLReader;
$r->open(SOURCEPATH);
while( $r->read() ) {
if ( XMLReader::ELEMENT === $r->nodeType ) {
if ( 0===++$cnt%500000 ) {
echo '.';
}
}
}
echo "\n#elements: ", $cnt, "\n";
}
function build() {
$fp = fopen(SOURCEPATH, 'wb');
$s = '<catalogue>';
//for($i = 0; $i < 500000; $i++) {
for($i = 0; $i < 60000000; $i++) {
$s .= sprintf('<item>%010d</item>', $i);
if ( 0===$i%100000 ) {
fwrite($fp, $s);
$s = '';
echo $i/100000, ' ';
}
}
$s .= '</catalogue>';
fwrite($fp, $s);
flush($fp);
fclose($fp);
}
output:
filesize: 1,380,000,023
start: 2010-08-07 09:43:31
........................................................................................................................
#elements: 60000001
end: 2010-08-07 09:43:31
diff: 07:31
(as you can see I screwed up the output of the end-time but I don't want to run this script another 7+ minutes ;-))
Does this also work on your system?
As a side-note: The corresponding C# test application took only 41 seconds instead of 7,5 minutes. And my slow harddrive might have been the/one limiting factor in this case.
filesize: 1.380.000.023
start: 2010-08-07 09:55:24
........................................................................................................................
#elements: 60000001
end: 2010-08-07 09:56:05
diff: 00:41
and the source:
using System;
using System.IO;
using System.Xml;
namespace ConsoleApplication1
{
class SOTest
{
delegate void Foo();
const string sourcepath = #"d:\test.xml";
static void timing(Foo bar)
{
DateTime dtStart = DateTime.Now;
System.Console.WriteLine("start: " + dtStart.ToString("yyyy-MM-dd HH:mm:ss"));
bar();
DateTime dtEnd = DateTime.Now;
System.Console.WriteLine("end: " + dtEnd.ToString("yyyy-MM-dd HH:mm:ss"));
TimeSpan s = dtEnd.Subtract(dtStart);
System.Console.WriteLine("diff: {0:00}:{1:00}", s.Minutes, s.Seconds);
}
static void readTest()
{
XmlTextReader reader = new XmlTextReader(sourcepath);
int cnt = 0;
while (reader.Read())
{
if (XmlNodeType.Element == reader.NodeType)
{
if (0 == ++cnt % 500000)
{
System.Console.Write('.');
}
}
}
System.Console.WriteLine("\n#elements: " + cnt + "\n");
}
static void Main()
{
FileInfo f = new FileInfo(sourcepath);
System.Console.WriteLine("filesize: {0:N0}", f.Length);
timing(readTest);
return;
}
}
}