So i've been trying to build a regex for the past couple hours and i'm starting to go crazy in thinking if this is even possible or worth wild.
I have a script that scans PHP files checking MD5 sum for known malicious files, and certain strings. Most recently i've come across files where instead of using base64_decode in the PHP file, they are using variables and concatenating it so the scanner doesn't pick it up.
As an example here's the latest one I found:
$a='bas'.'e6'.'4_d'.'ecode';eval($a
So because the scanner searches for base64_decode this file wasn't picked up as they are using PHP to concatenate base64_decode in a variable, and then call the variable.
Forgive me because i've just started with regex, but is it even possible to search for something like this using regex? I mean, I understand and was able to get a regex that would match that exact one, but what about if they used this instead:
$a='b'.'ase'.'64_d'.'ecode';eval($a
It wouldn't be picked up because the regex was looking for ' then b then a, etc etc.
I've already added
(eval)\(\$[a-z]
To send me an email as a notice to check the file, i'll have to let it run for a couple days and see how many false positives show up, but my main concern is with the base64_decode
If someone could please shed some light on this for me and maybe point me in the right direction, I would greatly appreciate it.
Thanks!!
You can use this regexp:
b\W*a\W*s\W*e\W*6\W*4\W*_\W*d\W*e\W*c\W*o\W*d\W*e
It searches for base64_decode with any non-alphanumeric characters interspersed.
Related
Is it possible to display the strings that match a regular expression?
Example:
Take the expression /^AD\d{3}/
and display AD999
What I'm doing is validating a string that is pretty simple either containing all numbers, a few characters maybe, and maybe a '-'. I am validating a postal code on form submit against a database of all countries that use a postal code.
I could perform it in Javascript or PHP, if that makes any difference.
No. That sort of feature is not available.
You can try to implement it yourself, but I don't think that's the solution for you. Simply write the messages normally. Not everything must always be dynamic.
I like your way of thinking though.
It is possible. The developers of PEX figured it out.
Don't get your hopes up, I don't know of any javascript implementation.
There is one for javascript now: http://fent.github.io/randexp.js/.
I have understood your problem a little better from your additional comments.
Since your data is only postal codes, I suggest that it would possible to work in the other direction and store a picture in the database and automatically generate a regex from that.
For instance, UK postcodes look like AA?99? 9AA | AA?9A 9AA which is easily converted to a regex (using a regex!).
I'm stuck on a crazy project that has me looking for a strange solution. I've got a XFA PDF document generated by an outside party. There's are several checkmark characters '✓' on the PDF's that I need to simply change to 'X'. The reason for this is beyond my control. I'm just looking for a way to change the ✓'s into X's. Can anyone point me in the right direction? Is it possible?
Currently we use PHP and TCPDF for creating "our" server PDF's, but this particular PDF is generated outside of my control by a third party that doesn't want to alter their way of doing things. To make things worse, I don't know how many or where the checkmarks may exist. It's just one very specific character that is in need of changing. Does any know a way of hacking the document to change the character?
Character 2713
http://www.fileformat.info/info/unicode/char/2713/index.htm
Yes, I think you can. To my (rather limited) knowledge of the PDF format, you can only reliably search and replace strings of one character in length, since they are created by placing strings of variable length at specific co-ordinates, in an arbitrary order. The string 'hello' could therefore be one string of five letters, or five strings of one letter each or some combination thereof, all placed in the correct position (and in whatever order the print driver decided upon).
I'm afraid I don't know of any libraries that will do this, but I'd be surprised if they don't exist. You'll need to read PDF objects in, do the replacement, and write them out to a new file. I'd start off researching around the answers to this question.
Edit: this looks like it might be useful.
I am building a string to detect whether filename makes sense or if they are completely random with PHP. I'm using regular expressions.
A valid filename = sample-image-25.jpg
A random filename = 46347sdga467234626.jpg
I want to check if the filename makes sense or not, if not, I want to alert the user to fix the filename before continuing.
Any help?
I'm not really sure that's possible because I'm not sure it's possible to define "random" in a way the computer will understand sufficiently well.
"umiarkowany" looks random, but it's a perfectly valid word I pulled off the Polish Wikipedia page for South Korea.
My advice is to think more deeply about why this design detail is important, and look for a more feasible solution to the underlying problem.
You need way to much work on that. You should make an huge array of most-used-word (like a dictionary) and check if most of the work inside the file (maybe separated by - or _) are there and it will have huge bugs.
Basically you will need of
explode()
implode()
array_search() or in_array()
Take the string and look for a piece glue like "_" or "-" with preg_match(); if there are some, explode the string into an array and compare that array with the dictionary array.
Or, since almost every words has alternate vowel and consonants you could make an huge script that checks whatever most of the words inside the file name are considered "not-random" generated. But the problem will be the same: why do you need of that? Check for a more flexible solution.
Notice:
Consider that even a simple-and-friendly-file.png could be the result of a string generator.
Good luck with that.
I wouldn't call myself a master regarding regex, i pretty much just know the basics. I've been playing around with it, but i can't seem to get the desired result. So if someone would help me, i would really appreciate it!
I'm trying to check wether unwanted words exist in a string. I'm working on a math project, and i'm gonna be using eval() to calculate the string, so i need to make sure it's safe.
The string may contain (just for example now, i'll add more functions later) the following words: (read the comments)
floor() // spaces or numbers are allowed between the () chars. If possible, i'd also like to allow other math functions inside, so it'd look like: floor( floor(8)*1 ).
It may contain any digit, any math sign (+ - * /) and dots/commas (,.) anywhere in the string
Just to be clear, here's another example: If a string like this is passed, i do not want it to pass:
9*9 + include('somefile') / floor(2) // Just a random example on something that's not allowed
Now that i think about it, it looks kind of complicated. I hope you can at least give me some hints.
Thanks in advance,
-Anthony
Edit: This is a bit off-topic, but if you know a better way of calculating math functions, please suggest it. I've been looking for a safe math class/function that calculates an input string, but i haven't found one yet.
Please do not use eval() for this.
My standard answer to this question whenever it crops up:
Don't use eval (especially if the formula contains user input) or reinvent the wheel by writing your own formula parser.
Take a look at the evalMath class on PHPClasses. It should do everything that you want in a nice safe sandbox.
To rephrase your problem, you want to allow only a specific set of characters, plus certain predefined words. The alternation operator (pipe symbol) is your friend in this case:
([0-9\+\-\*\/\.\,\(\) ]|floor|ceiling|other|functions)*
Of course, using eval is inherently dangerous, and it is difficult to guarantee that this regex will offer full protection in a language with syntax as expansive as PHP.
I am trying to write a class that can parse an iCalendar file and am hitting some brick walls. Each line can be in the format:
PARAMETER[;PARAM_PROPERTY..]:VALUE[,VALUE2..]
It's pretty easy to parse with either a bunch of splits or regex's until you find out that values can have backticked commas, also they can be double quote marked which makes life hard. for example:
PARAMETER:"my , cool, value",value\,2,value3
In this example you are meant to pull out the three values:
my , cool value
value,2
value3
Which makes it a little more difficult.
Suggestions?
Go through the file char by char and split the values manually, whenever you have a quotation mark you enter "quotation mode" where you won't split at commas and when the closing quotation mark comes you leave it.
For the backticked commas: If you read in a backslash you also read the next character and decide what to do with it then.
Of course that's not extremely efficient, but you can't use regular expressions for this. I mean you can, but since I believe that there also can be escaped quotation marks this is going to be really messy.
If you want to give it a try though:
let's start by matching a quotation mark followed by characters that are not: "[^"]*"
to overcome the problem of escaped characters you can use lookaheads (?<!\\)"[^"]*(?<!\\)"
now it will break if escaped quotation marks are in the value, maybe this works?(haven't tested it) (?<!\\)"[^"|(?<=\\)"]*(?<!\\)"
So you see it very fast get's messy, so I would suggest to you to read it in characterwise.
I had the same problems. I found it a bit hard to turn 'any' iCalendar file into a usable PHP object/array structure, so instead I've been trying to convert iCalendar to xCal.
This is my implementation:
http://code.google.com/p/sabredav/source/browse/branches/caldav/lib/Sabre/CalDAV/ICalendarToXML.php
I must say that this script is not fully tested, but it might be enough to get your started.
Have you tried pulling something out of http://phpicalendar.net/ ?
Is this the project you're thinking of? I'm the auther :) The first usable version (v0.1.0) should be ready in about a month. It is capable of working with about 85% of the iCalendar spec right now, but recurring events are really tough. I'm working on them right now. Once those are complete, the library will be fully capable of doing anything in the spec.
qCal Google Code Homepage
Enjoy!