Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I use Haxe to generate PHP code. (This means you write you code in the Haxe language and get a bunch of php files after compiling.) Today a customer told me that he needs a new feature on a old project made with Haxe. He also told me that he altered some small things on the code for his own needs. Now I first have port his changes to my Haxe code and then add the new feature, because otherwise his changes will be overwritten by the next time I compile the project.
To prevent that this happens again I am looking for some kind of program that minifies / obfuscates the PHP code. The goal is to make the code unreadable / uneditable as possible.
The ideal tool would run under Linux and could process whole folders and all it containing files.
Anybody any suggestions?
Why not use the php buid in function php_strip_whitespace()
string php_strip_whitespace ( string $filename )
Returns the PHP source code in filename with PHP comments and whitespace removed. This may be useful for determining the amount of actual code in your scripts compared with the amount of comments. This is similar to using php -w from the commandline.
I agree with the comment, what you are doing is very underhanded, but after 10 years in this biz I can attest to one thing: Half the code you get is so convoluted it might as well have been minified, and really function/var names are so often completely arbitrary, i've edited minified js and it wasn't much more of a hassle than some unminified code.
I couldn't find any such script/program, most likely because this is kind of against the PHP spirit and a bit underhanded, never the less.
First: Php isn't white space sensitive, so step one is to remove all newlines and whitespace outside of string.
That would make it difficult to mess with for the average tinkerer, an intermediate programmer would just find and replace all ;{} with $1\n or something to that effect.
The next step would be to get_defined_functions and save that array (The 'user' key in the returned array), you'll need to include all of the files to do this.
If it's oo code, you'll need get_defined_classes as well. Save that array.
Essentially, you need to get the variables, methods, and class instances, you'll have to instantiate the class and get_object_vars on it, and you can poke around and see that you can get alot of other info, like Constants and class vars etc.
Then you take those lists, loop through them, create a unique name for each thing, and then preg_replace, or str_replace that in all of the files.
Make sure you do this on a test copy, and see what errors you get.
Though, just to be clear, there is a special place in hell reserved for people who obfuscate for obfuscation's sake.
Check out: get_defined_functions get_declared_classes and just follow the links around to see what you can do.
We use Zend Guard to encode our PHP code with certain clients, but as Parrots said, you need to be sure you own the code. We only encode in certain situations, and only when it's explicit that we retain ownership of the code, otherwise Parrots is right, the client has a right to modify it.
I know of Zendguard, Expressionengine used it to encrypt their trial version's core code. You could always give that a go although you need to pay for it.
However, while I understand the frustration of having to port his changes, I assume they purchased the code from you? They have the right to modify it. You just have the right to charge them extra to port their changes ;) Imagine if you stopped working for them, how could they ever hire someone else to update the code?
Our PHP Obfuscator does exactly the the job of stripping comments, whitespaces, and scrambling identifiers.
It operates across a complete set of PHP files to ensure that scrambled symbols are scrambled
consistently across those files, ensuring correct operation even after scrambling.
EDIT 2013: Now encrypts string literals to make them unreadable. Operates under Windows, and on Linux under Wine.
You can try PHP Obfuscator or the bcompiler PHP extension.
I have just find minify-service for PHP. It's really looks usefull. They says, that obfuscating will be available soon. I hope this is true :)
http://customhost.com.ua/php-minify/
Related
I know that \Phar and \PharData exist, but I'm having some trouble with the methods they supply so far. I'm still having to detect the mime-type / file type by whatever means, before determining which Phar*::method() to use in an attempt to extract the archive and do work on the files it contains.
Is there a go-to, "easy-button" class that I could include (maybe some package available via composer) that handles this at a very high level? Or am I failing to use the Phar and friends properly or in need of re-RTM so far?
Basically, I want to do the following (it's a CLI script that I control for now, so security, while important with this type of thing, is on the backburner for now):
Detect that a file might be an archive of some kind.
Validate that it seems to be one of the following: .tar, .gz, .tar.gz, or .zip.
If so, attempt to extract the archive and then parse the content of it's actual files.
Is there an "easy-button" for this that I'm unaware of, or do I need to build some logic that guesses as best it can as to what type of archive it might be, and then try to use the appropriate Phar* method to attempt to extract it's files and do whatever work I need to on them?
I hope that makes sense the way I wrote it. I'm trying to avoid re-inventing the wheel for a mini-project here if someone has already figured all of this out basically.
So, while continuing to research this I ended up seeing my own (this) SO question in google search results, which annoys me for some reason. So just in case someone stumbles upon this looking for a good solution, I've since found a couple by searching https://packagist.org/search/?q=archive (go figure):
Here's a few of them that seem promising.
wapmorgan/UnifiedArchive:
wapmorgan/UnifiedArchive (packagist)
wapmorgan/UnifiedArchive (github source)
Features (at first glance):
Only has one requirement of pear/archive_tar (which includes a few
more utility classes also from pear).
It attempts to detect the filetype for you, so it could eliminate the need to do that on your own.
alchemy/zippy:
alchemy/zippy (packagist)
alchemy-fr/Zippy (github source)
Features (at first glance):
Code looks to have been very well designed.
Seems to integrate with Laravel and guzzle\guzzle (the popular php http client) in some way so that might be an advantage for some.
zetacomponents/Archive
zetacomponents/Archive (packagist)
zetacomponents/Archive (github source)
Features (at first glance):
It seems to be a pure php implementation? If so that's just awesome.
Last updated 15 days ago, so it's the most active of the three I mentioned.
Seems to be maintained by an organization as opposed to a single person.
It has the most downloads by far on packagist (when searching for "archive"), and though I haven't played with it yet, that's usually a good sign.
Disclaimer: I have only actually tried wapmorgan/UnifiedArchive as of this writing, and so far it's exactly what I was looking for.
Anyway, I hope this helps anyone who might stumble upon this question.
If you don't need pure php and if your code is running on a linux machine, a
exec('uncompress [-cfv] [file...]');
or a
exec('unzip filename.zip -d destination');
will extract the file and make it usable for php.
Of course you need to check the extension (zip, tar, etc) in order to call the right command
I am looking at optimization options, and after checking SO questions, I don't quite see an answer for what I am trying to do. Hopefully that doesn't indicate that what I am doing is a bad practice!
I have an intranet application that loads page content via ajax calls to php files. A lot of the php files have a mixture of php, JavaScript, even some HTML, specific to the interface functionality that they load into the main interface. I was wondering about minifying or compressing these files. Is there a way to do it, or am I stuck because I have mixed languages?
Update: Concerning accepted answer:
I have accepted wildpeaks answer because I think it most closely answers my original question. However, this is one of those times when I wish I could accept two answers because I think the answer Igor Zinov'yev provided has given me perhaps a more important design decision to think about. For that reason I have given a +1 to his answer, as I imagine others will too. Hope that makes sense and is within the SO rules.
Your PHP script generates the Javascript code, so it can minify the code before outputting it: generate the code in a variable, then pass that variable to the minifier, and only then output to the browser.
Here's a PHP library for that.
You are starting your optimization off the wrong end. Obviously if you have hard-coded JavaScript, HTML and whatever else inside your PHP files, you seriously need to refactor the code. But even if you don't, you shouldn't minify the code in place because it would be even harder to maintain.
Pull it out of there, start with small steps, and you will get there eventually.
UPDATE: I thought of replying with a comment, but instead decided to elaborate on why I answered your question this way here.
I'm talking here about separation of concerns. Your server-side code files are no place for the client-side code. All solutions that do this that I have seen so far sooner or later turn into an unmaintainable mess.
If you want to return a piece of HTML code, put it into a template and supply the template with variables that are specific for this current situation. You can do that with Smarty. This way you get among others the following benefits:
No repeating pieces of markup over and over - there are template loops for that
A possibility to re-use existing templates in several places
Developers working with templates do not need to get into your server-side code
And your server-side code gets cleaner, smells nice too!
Later on when you separate logic from presentation maybe you will find that you don't need to send JavaScript code with HTML snippets. Maybe you will create a single JS engine (that you will minify on build) and will only have to trigger certain events upon load.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Are PHP short tags acceptable to use?
I'm just learning php and (been learning for about 6 months) and in a tutorial that I'm going through, it's using php shorthands, so when I looked it up on google, I came to this stack overflow question StackOverflow question where one of the popular answers says that shorthands are bad.
I know one of the following comments then suggest that it's not bad but I also remotely remember reading from a php book before that it's not always good to use them. So I'm a bit confused, are they bad or not?
It is generally a bad idea because of portability. All PHP configurations understand the <?php ?> tags, but not all are configured to use <? ?>.
Same thing goes for <? =$variable; ?> for printing.
They are not bad actually, but you can say that it's kinda a bit lazy sort of thing to do for a GOOD TECHNICAL programmer. Why I'm using this word "GOOD TECHNICAL" is because, since they know about the technicalities of PHP, then they should also know whether the shorthands will be of any use or not in the long run, whenever any adjustments is to be made regarding the fine tunings of the PHP Server.
But still, it's one of my views & may be it will not match with others' answers.
I personally never use them because I find it makes my code ugly, hard to read and harder to debug. I'm talking about shorthands like the one-line if statement.
Not every PHP configuration understands short open and ending tags, and not every programmer knows about shorthand notation so this might be a problem if you want to share code at some point. I wouldn't advise using it.
the code "<? ?>" depends on the "php.ini", you should change the state short open tag. While the code "<?php ?>" can run everytime everywhere, without any configuration.
I recommend you to use smarty template.It's perfectly easy to use. And code is beautiful.
I am going to start working on a website that has already been built by someone else.
The main script was bought and then adjusted by the lead programmer. The lead has left and I am the only programmer.
Never met the lead and there are no papers, documentation or comments in the code to help me out, also there are many functions with single letter names. There are also parts of the code that are all compressed in one line (like where there should be 200 lines there is one).
There are a few hundred files.
My questions are:
Does anyone have any advice on how to understand this system?
Has anyone had any similar experiences?
Does anyone have a quick way of decompressing the lines?
Please help me out here. This is my first big break and I really want this to work out well.
Thanks
EDIT:
On regards to the question:
- Does anyone have a quick way of decompressing the lines?
I just used notepad++ (extended replace) and netbeans (the format option) to change a file from 1696 lines to 5584!!
This is going to be a loooonnngggg project
For reformatting the source, try this online pretty-printer: http://www.prettyprinter.de/
For understanding the HTML and CSS, use Firebug.
For understanding the PHP code, step through it in a debugger. (I can't personally recommend a PHP debugger, but I've heard good things about Komodo.)
Start by checking the whole thing into source control, if you haven't already, and then as you work out what the various functions and variables do, rename them to something sensible and check in your changes.
If you can cobble together some rough regression tests (eg. with Selenium) before you start then you can be reasonably sure you aren't breaking anything as you go.
Ouch! I feel your pain!
A few things to get started:
If you're not using source control, don't do anything else until you get that set up. As you hack away at the files, you need to be able to revert to previous, presumably-working versions. Which source-control system you use isn't as important as using one. Subversion is easy and widely used.
Get an editor with a good PHP syntax highlighter and code folder. Which one is largely down to platform and personal taste; I like JEdit and Notepad++. These will help you navigate the code within a page. JEdit's folder is the best around. Notepad++ has a cool feature that when you highlight a word it highlights the other occurrences in the same file, so you can easily see e.g. where a tag begins, or where a variable is used.
Unwind those long lines by search-and-replace ';' with ';\n' -- at least you'll get every statement on a line of its own. The pretty-printer mentioned above will do the same plus indent. But I find that going in and indenting the code manually is a nice way to start to get familiar with it.
Analyze the website's major use cases and trace each one. If you're a front-end guy, this might be easier if you start from the front-end and work your way back to the DB; if you're a back-end guy, start with the DB and see what talks to it, and then how that's used to render pages -- either way works. Use FireBug in Firefox to inspect e.g. forms to see what names the fields take and what page they post to. Look at the PHP page to see what happens next. Use some echo() statements to print out the values of variables at various places. Finally, crack open the DB and get familiar with its schema.
Lather, rinse, repeat.
Good luck!
Could you get a copy of the original script version which was bought? It might be that that is documented. You could then use a comparison tool like Beyond Compare in order to extract any modifications that have been made.
If the functions names are only one letter it could be that the code is encoded with some kind of tool (I think Zend had a tool like that - Zend Encoder?) so that people cannot copy it. You should try to find an unencoded version, if there is one because that would save a lot of time.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Hi I know about several PDF Generators for php (fpdf, dompdf, etc.)
What I want to know is about a parser.
For reasons beyond my control, certain information I need is only in a table inside a pdf
and I need to extract that table and convert it to an array.
Any suggestions?
I've written one before (for similar needs), and I can say this: Have fun. It's quite a complex task. The PDF specification is large and unwieldy. There are several methods of storing text inside of it. And the kicker is that each PDF generator is different in how it works. So while something like TFPDF or DOMPDF creates REALLY easy to read PDFs (from a machine standpoint), Acrobat makes some really hellish documents.
The reason is how it writes the text. Most DOM based renderers --that I've used-- write the entire line as one string, and position it once (which is really easy to read). Acrobat tries to be more efficient (and it is) by writing only one or maybe a few characters at a time, and positioning them independently. While this REALLY simplifies rendering, it makes reading MUCH more difficult.
The up side here, is that the PDF format in itself is really simple. You have "objects" that follow a regular syntax. Then you can link them together to generate the content. The specification does a good job at describing the file format. But real world reading is going to take a bit of brain power...
Some helpful pieces of advice that I had to learn the hard way if you're going to write it yourself:
Adobe likes to re-map fonts. So character 65 will likely not be A... You need to find a map object and deduce what it's doing based upon what characters are in there. And it is efficient since if a character doesn't appear in the document for that font, it doesn't include it (which makes life difficult if you try to programmatically edit a PDF)...
Write it as abstract as possible. Write classes for each object type, and each native type (strings, numbers, etc). Let those classes parse for you. There will be a fair bit of repetition in there, but you'll save yourself in the end when you realize that you need to tweak something for only one specific type)...
Write for a specific version or two of the PDF spec, and enforce it. Check the version number, and if it's higher than you expect, bail... And don't try to "make it work". If you want to support newer versions, break out the specification and upgrade the parser from there. Don't try to trial and error your way up (it's not fun)...
Good luck with compressed streams. I've found that typically you can't trust the length arguments to verify what you are uncompressing. Sometimes (for some generators) it works well... Others it's off by one or more bytes. I just attempt to deflate it if the filter matches, and then force the length...
When testing lengths, don't use strlen. Use mb_strlen($string, '8bit') since it will compensate for different character sets (and allow potentially invalid characters in other charsets).
Otherwise, best of luck...
I use PDFBox for that (http://pdfbox.apache.org/). This software is javabased and platform independend. It works fast and reliable. You can use it via exec or shell execute or via a PHP/Java-Bridge (http://php-java-bridge.sourceforge.net/)
Have you already looked at xPDF ? There is a program in there called pdftotext that will do the conversion. You can call it from PHP and then read in the text version of the PDF. You will need to have the ability to run exec() or system() from php, so this may not work on all hosted solutions though.
Also, there are some examples on the PHP site that will convert PDF to text, although its pretty rough. You may want to try some of those examples as well. On that PHP page, search for luc at phpt dot org.
Zend_Pdf is part of the Zend Framework. Their manual states:
The Zend_Pdf component is a PDF
(Portable Document Format)
manipulation engine. It can load,
create, modify and save documents.
Thus it can help any PHP application
dynamically create PDF documents by
modifying existing documents or
generating new ones from scratch.
Have a look at GhostScript or ITextSharp, there are various cross-platform version of both.
It may not actually be a table inside the PDF as the PDF loses that sort of information...
This is PHP PDF parser, which exists in two flavours:
Free version can parse PDFs up to format PDF 1.5
Commercial add-on can parse any PDF format (up to current 1.9)