How to force line-breaks on ? - php

Sometimes text on my pages looks very strange, real example:
trained professionals and paraprofessionals coming together
...While the parent div is quite narrow so the text is just sticking out of it.
And it looks quite strange, because actually represents a space.
So, I wonder if it's possible to make the browser account these characters as actual spaces and break the line where necessary without actually replacing them?
EDIT
Why a blind replacing is a problem?
Because may be needed sometimes.
Consider the following example:
Ranks:<br>
Marshall<br>
Leutenant<br>
Sergeant
If I just use a preg_replace on them it would look differently in the end.
(I would also consider some suggestions if you have any ideas on replacing them smartly (for php platform) If you could think of some algorithm that wouldn't affect formatting.)

By definition, is a non-breakable space. It's very meaning is not to be broken across line endings. If this is not what you intend then I suggest fixing the HTML instead of trying to force the browser into non-standard behaviour.

Related

Bypass PHP's str_replace() when replacing a single character

So, I am studying some PHP security using DVWA (http://www.dvwa.co.uk/). Right now I'm on an exercise where the author tries to teach us to execute commands on vulnerable applications. In this level, it adds a very simple blacklist which removes important characters:
$substitutions = array(
'&&' => '',
';' => '',
);
I obviously can use some other characters to still get code executed (like |, ||, &, etc.), but I wanted to know how I'd evade the substitution for the single character ";". I've seen some examples around which fools the substitution with code like "<scr<script>ipt>" and I've tried stuff like ";;;"; tried to encode in hex and base64 and such but it didn't work.
Is there a way to evade str_replace() when it is looking for a single character? This is PHP 5.5.3.
I found this page to be useful when I was doing this. It turns out there are other operators which can be used other than ';' to plug your own command in!
The "hard" setting on this is currently causing myself some trouble, I think there may be a workaround using URL encoded characters or something of the sort, but it remains to be seen.
I'm not sure why the author is showing how to use a black-list, its too easily subverted, perhaps this idea is shredded further on in the tut. http://en.wikipedia.org/wiki/Secure_input_and_output_handling
Although the example you link to is the 'medium' level, even the 'harder' level does not use PHPs Filter FILTER_VALIDATE_IP
Even a REGEX would do a better job. See half way down the page of: http://www.regular-expressions.info/examples.html
If you are trying to protect against XSS attacks (you mention a mangled script tag) then white-listing is the way to go. Validate against what you expect to get, or abort.
EDIT
Hmmm.. now I see the site is called Damned Vulnerable Web App, perhaps the idea is to teach you all the poor examples ...

How to modify a specific character in an existing XFA PDF?

I'm stuck on a crazy project that has me looking for a strange solution. I've got a XFA PDF document generated by an outside party. There's are several checkmark characters '✓' on the PDF's that I need to simply change to 'X'. The reason for this is beyond my control. I'm just looking for a way to change the ✓'s into X's. Can anyone point me in the right direction? Is it possible?
Currently we use PHP and TCPDF for creating "our" server PDF's, but this particular PDF is generated outside of my control by a third party that doesn't want to alter their way of doing things. To make things worse, I don't know how many or where the checkmarks may exist. It's just one very specific character that is in need of changing. Does any know a way of hacking the document to change the character?
Character 2713
http://www.fileformat.info/info/unicode/char/2713/index.htm
Yes, I think you can. To my (rather limited) knowledge of the PDF format, you can only reliably search and replace strings of one character in length, since they are created by placing strings of variable length at specific co-ordinates, in an arbitrary order. The string 'hello' could therefore be one string of five letters, or five strings of one letter each or some combination thereof, all placed in the correct position (and in whatever order the print driver decided upon).
I'm afraid I don't know of any libraries that will do this, but I'd be surprised if they don't exist. You'll need to read PDF objects in, do the replacement, and write them out to a new file. I'd start off researching around the answers to this question.
Edit: this looks like it might be useful.

Is the TAB character bad in source code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm pretty familiar I guess with both Zend and PEAR PHP coding standards, and from my previous two employers no TAB characters were allowed in the code base, claiming it might be misinterpreted by the build script or something (something like that, I honestly can't remember the exact reason). We all set up our IDE's to explode TABs to 4 spaces.
I'm very used to this, but now my newest employer insists on using TABs and not spaces for indentation. I suppose I shouldn't really care since I can just tell PHP Storm to just use the TAB char when i hit the Tab key, but, I do. I want spaces and I'd like a valid argument for why spaces are better than TABs.
So, personal preferences aside, my question is, is there a legitimate reason to avoid using TABs in our code base?
Keep in mind this coding standard applies to PHP and JavaScript.
Tabs are better
Clearer to reader what level each piece of code is on; spaces can be ambiguous, espcially when it's unclear whether you're using 2-space tabs or 4-space tabs
Conceptually makes more sense. When you indent, you expect a tab and not spaces. The document should represent what you did on your keyboard, and not in-document settings.
Tabs can't be confused with spaces in extended lines of code. Especially when word warp is enabled, there could be a space in a wrapped series of words. This could obviously confuse the readers. It would slow down paired programming.
Tabs are a special character. When viewing of special characters is enabled on your IDE, levels can be more easily identified.
Note to all coders: you can easily switch between tabs and spaces using any of JetBrains' editors (ex. PHPStorm, RubyIDE, ReSharper, IntelliJIDEA, etc.) by simply pressing CTRL + ALT + L on Windows, Mac, or Linux.
is there a legitimate reason to avoid using TABs in our code base?
I have consulted at many a company and not once have I run into a codebase that didn't have some sort of mixture of tabs and spaces among various source files and not once has it been a problem.
Preferred? Sure.
Legitimate, as in accordance with established rules, principles, or standards? No.
Edit
All I really want to know is if there's nothing wrong with TABs, why
would both Zend and PEAR specifically say they are not allowed?
Because it's their preference. A convention they wish to be followed to keep uniformity (along with things like naming and brace style). Nothing more.
Spaces are better than tabs because different editors and viewers, or different editor settings, might cause tabs to be displayed differently. That's the only legitimate reason for avoiding tabs if your programming language treats tabs and spaces the same. If some tool chokes on tabs, then that tool is broken from the language point of view.
Even when everybody in your team sets their editor to treat tabs as four spaces, you'll get a different display when you have to open up your source code in some tool that doesn't.
The most important thing to worry about is being consistent about always using the same indentation scheme - having a confused mix of tabs and spaces is living hell, and is worse then either pure tabs or pure spaces. Therefore, if the rest of the project is using tabs you should use them too.
Anyway, there isn't a clear winner on Tabs vs Spaces. Space supporters say that the using only spaces for everything is a simper rule to enforce while Tabs supporters say that using tabs for indentation and spaces for alignment allows different developers to display the tab-width they find more comfortable.
In the end, tabs-vs-spaces is should not be a bid deal. The only time I have seem people argue that one of the alternatives is strictly better then the other is in indentation-sensitive languages, like Python or Haskell. In these mixing tabs and spaces can change the program semantics in hard to see ways, instead of only making the source code look weird.
Ever since my first CS class, tabs have always been taboo. Reason being, tabs are basically like variables. Different IDE's can define a TAB as a different number of spaces. Speaking from a Visual Studio/NetBeans/DevC++ perspective, all have the capacity to change the 'definition' of a TAB based on number of desired spaces. So if you have 4 spaces defined, there is no way that you can know if my IDE says 3 spaces or 5 spaces. So if anyone happens to use a space-based indentation style and someone else uses TABS, the formatting can get all jacked up.
As a counter-point, however, if the 'standard' is to always use tabs, then it really wouldn't matter since the formatting will all appear the same - regardless of the number of defined spaces. But all it takes is one person to use a space and the formatting can look horrid and get really confusing. This can't happen when using spaces. Also, what happens if you don't want to use the same spacing between functions/methods, etc? What if you like using 4 spaces in some cases and only 2 in other cases?
I have seen build scripts that parse source code and generate documentation or even other code. These kind of scripts usually depend on the code being in an expected format, and frequently that means either using spaces (or sometimes tabs). Perhaps these scripts could be modified to be more robust by checking for tabs or spaces, but frequently you are stuck with what you've got. In that kind of an environment, consistent formatting becomes more important.

Is it possible to write a regex which checks if a string (javascript & php code) is minified?

Is it possible to write a regular expression which checks if a string (some code) is minified?
Many PHP/JS obfuscators remove white space chars (among other things).
So, the final minified code sometimes looks like this:
PHP:
$a=array();if(is_array($a)){echo'ok';}
JS:
a=[];if(typeof(a)=='object'&&(a instanceof Array){alert('ok')}
in both cases there are no space chars before and after "{", "}", ";", etc. There also some other patterns which can help. I am not expecting a high accuracy regex, just need one which checks if at least 100 chars of string looks like minified code.
Thanks in advice.
PURPOSES: web malware scanner
I think a minifier will strip all newline characters, although there might possibly be one at the end of the file still if the minified code was pasted back in a text editor. Something like this will probably be fairly accurate:
/^[^\n\r]+(\r\n?|\n)?$/
That just tests that there are no newline characters in the whole thing except for possibly one at the end. So no guarantees, but I think it will work well on any longish block of code.
The short answer is "no", regex cannot do this.
Your best bet will probably be to do a statistical analysis of the source files, and compare against some known heuristics. For instance, by comparing the variable names against those often found in minimized code. A minimized file probably has a lot of one-character variable names, for instance... and won't have two-character variable names until all the one-character variable names are exhausted... etc.
Another option would be simply to run the source file through a minimizer, and see if the output is sufficiently different from the input. If not, it was probably already minimized.
But I have to agree with sg3s's final sentence: If you can explain why you need this, we can probably provide more useful answers to your actual needs.
No. Since the syntax/code and its intention doesn't change and some people who're very familiar with the php and/or js will write simple functions on one line without any whitespace at all (me :s).
What you could do is count all the whitespace characters in a string though this would also be unreliable since for some stuff you simply need whitespace, like x instanceof y heh. Also not all code is minified and cramped into a single row (see jQuery UI) so you can't really count on that either....
Maybe you can explain why you need to know this and we can try and find an alternative?
You can't tell if it's got minified or just written like that by hand (probably only applies for smaller scripts). But you can check if it doesn't contain unnecessary whitespace.
Take a look at open source obfuscator/minifier and see what rules they use to remove the whitespace. Validating if those rules were applied should work, if regex get to complex, a simple parser might be needed.
Just make sure that string literals like a="if ( b )" are excluded.
Run it through a parser for that particular language (even a prettifier might work fine) and modify it to count the number of unused characters. Use the percentage of unused chars vs. number of chars in documents as a test for minification. I don't think you can do this accurately with regex, although counting whitespace vs. document content might be okay.

How might I truncate HTML with JS (prefered) or PHP?

I am trying to use JS (prefered) or PHP to access APIs like StackOverflow, Tumblr & Forrst to get my latest posts to display in my blog. So I will need a way to truncate the HTML returned, so that it fits into a "widget" sized space.
How might I do it with JS or PHP? It should
not truncate creating invalid HTML
not truncate words (leaving half a word for example)
I am also considering stripping out code blocks or images that otherwise may not fit well. But this is secondary
Well, as I guess, when you truncate a piece of code, you should be careful not to break its workings [in case of HTML, make sure all opening and closing tags remain intact], of course, if you are considering to keep those code blocks. This will require good piece of code heavily loaded with Reg-ex, and I doubt it would be a good idea to achieve this goal with Jscript - PHP would be much faster and safer way...
On the other hand, if you are considering getting rid of all code blocks, first use striptags() function of PHP [you can add <img> as a second parameter to it to keep IMG tags] like:
$clean = striptags( $incoming, "<img>" );
And then truncate your code making sure you are not damaging closing ">" characters of tags. Again, Reg-ex will do the job: just use Reg-ex conditionals and look-forwards, -behinds to achieve that goal.
Once you're done with tags, it's time to make sure you are not damaging your Multi-byte characters: using truncate without control, might corrupt multi-byte characters by splitting their bytes apart. To achieve this try using PHP's mb_substr() function. As you are doing this truncation, you might wish to make your code not count the remaining HTML tags in it as characters - using Reg-ex, you can temporarily replace them with placeholders, once truncation is done, place the original values back in.
So, "simply" put: It requires good command of PHP and some coding, which is hard to post here, I am afraid.
Depending on your needs, you may not actually need to do any truncating at all. Instead, you might be able to style the container that you put the HTML in and set overflow: hidden; to prevent it taking up more space than you want.
This way, you know that you won't be cutting a word in half (as the browsers will "wrap" it nicely) and you know that you won't be accidentally breaking the HTML code, as it will all still be there.
As I said, depending on your specific needs, and the specific HTML that you are getting back, this may or may not be an option. But I think it's worth at least considering.

Categories