I am building a string to detect whether filename makes sense or if they are completely random with PHP. I'm using regular expressions.
A valid filename = sample-image-25.jpg
A random filename = 46347sdga467234626.jpg
I want to check if the filename makes sense or not, if not, I want to alert the user to fix the filename before continuing.
Any help?
I'm not really sure that's possible because I'm not sure it's possible to define "random" in a way the computer will understand sufficiently well.
"umiarkowany" looks random, but it's a perfectly valid word I pulled off the Polish Wikipedia page for South Korea.
My advice is to think more deeply about why this design detail is important, and look for a more feasible solution to the underlying problem.
You need way to much work on that. You should make an huge array of most-used-word (like a dictionary) and check if most of the work inside the file (maybe separated by - or _) are there and it will have huge bugs.
Basically you will need of
explode()
implode()
array_search() or in_array()
Take the string and look for a piece glue like "_" or "-" with preg_match(); if there are some, explode the string into an array and compare that array with the dictionary array.
Or, since almost every words has alternate vowel and consonants you could make an huge script that checks whatever most of the words inside the file name are considered "not-random" generated. But the problem will be the same: why do you need of that? Check for a more flexible solution.
Notice:
Consider that even a simple-and-friendly-file.png could be the result of a string generator.
Good luck with that.
Related
I attempting what I thought would be a simple exercise, but unless I’m missing a trick, it seems anything but simple.
Im attempting to clean up user input into a form before saving it. The particular problem I have is with hyphenated town names. For example, take Bourton-on-the-Water. Assume the user has Caps lock on or puts spaces next to the hyphens of any other screw up that might come to mind. How do I, within reason, turn it into what it’s meant to be?
You can use trim() to remove whitespace (or other characters) from the beginning and end of a string. You can also use explode() to break strings into parts by a specified character and then recreate your string as you like.
I think the only way you can really accomplish this is by improving the way the user inputs their data.
For example use a postcode lookup system that enters an address based on what they type.
Or have a autocomplete from a predefined list of towns (similar to how Facebook shows towns).
To consider every possible permutation of Bourton On The Water / Bourton-On-The-Water etc... is pretty much impossible.
is there any way we can check if a php file has been obfuscated, using php? I was thinking regex maybe (for instance ioncube's encoded file contains a very long alphabet string, etc.
One idea is to check for whitespace. The first thing that an obfuscator will do is to remove extra whitespace. Another thing you can look for is the number of characters per line, as obfuscators will put all the code into few (one?) lines.
Often, obsfuscators initialize very large arrays to translate variables into less meaningful names (eg. see obsfucator article
One technique may be to search for these super-large arrays, close to the top of the class/file etc. You may be able to hook xdebug up to examine/look for these. The whole thing of course depends on the obsfuscation technique used. Check the source code, there may be patterns they've used that you can search on.
I think you can use token_get_all() to parse the file - then compute some statistics. For example check for number of function calls(in calse obfuscator uses some eval() string and nothing else) and calculate average function length - for obfuscators it will usually be about 3-5 chars, for normal PHP code it should be much bigger. You can also use dictionary lookup for function/variable names, check for comments etc. I think if you know all obfuscator formats that you want to detect - it will be easy.
I'm trying to parse a plain text document in PHP but have no idea how to do it correctly.
I want to separate each word, assign them an ID and save the result in JSON format.
Sample text:
"Hello, how are you (today)"
This is what im doing at the moment:
$document_array = explode(' ', $document_text);
json_encode($document_array);
The resulting JSON is
[["Hello,"],["how"],["are"],["you"],["(today)"]]
How do I ensure that spaces are kept in-place and that symbols are not included along with the words...
[["Hello"],[", "],["how"],[" "],["are"],[" "],["you"],[" ("],["today"],[")"]]
I’m sure some sort of regex is required... but have no idea what kind of pattern to apply to deal with all cases... Any suggestions guys?
This is actually a really complex problem, and one that's subject to a fair amount of academic reaserch. It sounds so simple (just split on whitespace! with maybe a few rules for punctuation...) but you quickly run into issues. Is "didn't" one word or two? What about hyphenated words? Some might be one word, some might be two. What about multiple successive punctuation characters? Possessives versus quotes? etc etc. Even determining the end of a sentence is non-trivial. (It's just a full stop right?!)
This problem is one of tokenisation and a topic that search engines take very seriously. To be honest you should really look at finding a tokeniser in your language of choice.
Maybe this:?
array_filter(preg_split('/\b/', $document_text))
the 'array_filter', removes the empty values at the first and/or last index of the resulting array, which will appear if your string start or ends with a word boundary (\b see: http://php.net/manual/en/regexp.reference.escape.php)
I wouldn't call myself a master regarding regex, i pretty much just know the basics. I've been playing around with it, but i can't seem to get the desired result. So if someone would help me, i would really appreciate it!
I'm trying to check wether unwanted words exist in a string. I'm working on a math project, and i'm gonna be using eval() to calculate the string, so i need to make sure it's safe.
The string may contain (just for example now, i'll add more functions later) the following words: (read the comments)
floor() // spaces or numbers are allowed between the () chars. If possible, i'd also like to allow other math functions inside, so it'd look like: floor( floor(8)*1 ).
It may contain any digit, any math sign (+ - * /) and dots/commas (,.) anywhere in the string
Just to be clear, here's another example: If a string like this is passed, i do not want it to pass:
9*9 + include('somefile') / floor(2) // Just a random example on something that's not allowed
Now that i think about it, it looks kind of complicated. I hope you can at least give me some hints.
Thanks in advance,
-Anthony
Edit: This is a bit off-topic, but if you know a better way of calculating math functions, please suggest it. I've been looking for a safe math class/function that calculates an input string, but i haven't found one yet.
Please do not use eval() for this.
My standard answer to this question whenever it crops up:
Don't use eval (especially if the formula contains user input) or reinvent the wheel by writing your own formula parser.
Take a look at the evalMath class on PHPClasses. It should do everything that you want in a nice safe sandbox.
To rephrase your problem, you want to allow only a specific set of characters, plus certain predefined words. The alternation operator (pipe symbol) is your friend in this case:
([0-9\+\-\*\/\.\,\(\) ]|floor|ceiling|other|functions)*
Of course, using eval is inherently dangerous, and it is difficult to guarantee that this regex will offer full protection in a language with syntax as expansive as PHP.
I have two strings and I would like to mix the characters from each string into one bigger string, how can I do this in PHP? I can swap chars over but I want something more complicated since it could be guessed.
And please don't say md5() is enough and irreversible. :)
$string1 = '9cb5jplgvsiedji9mi9o6a8qq1';//session_id()
$string2 = '5d41402abc4b2a76b9719d911017c592';//md5()
Thank you for any help.
EDIT: Ah sorry Rob. It would be great if there is a solution where it was just a function I could pass two strings to, and it returned a string.
The returned string must contain both of the previous strings. Not just a concatination, but the characters of each string are mingled into one bigger one.
If you want to make a tamper-proof string which is human readable, add a secure hash to it. MD5 is indeed falling out of favour, so try sha1. For example
$salt="secret";
$hash=sha1($string1.$string2.$salt);
$separator="_";
$str=$string1.$separator.$string2.$separator.$hash;
If you want a string which cannot be read by humans, encrypt it - check out the mcrypt extension which offers a variety of options.
Use one of the SHA variants of the hash() function. Sha2 or sha256 should be sufficient and certainly much better than anything you could come up with.
Unless I am missing something if your wanting to combine those values into a unique value why not do sha1(string1, string2);
I'm guessing you want something reversible, so you can get these values back out. A quick-and-dirty technique for obscuring these two strings further would be to base64-encode them:
base64_encode($string1 . $string2);
Thank you everyone. I completely forgot about the SHA1 - got too into solving a problem that I forgot what else was out there. :)
Well, if not md5(), then sha1(). :)
Anyway,the possibilities to mangle are endless, pick your poison.
What I would do, if I really wanted to do something like that (which can be useful occasionally), I would add another element, chosen on random and shuffle the md5 string by it. and write down the random element in it, too.
For example, let us add to each md5 character a random 2 digit number, which we then split by digits and add 1st digit to resulting string, and 2nd digit - prepend to it.
I stumbled upon someplace where something of that kind was done today. I was trying to find some reference to a particular phone number - whether it appears anywhere on the country-local inet or not.
I visited a popular classified ads site, which gives phone numbers of advertisers and you have the option, when you are looking at a particular ad, to find all ads with the same phone number. Now, what they did, however, was that they encoded search string, so you are not searching for ?phone=123123, but something like ?phone==FFYx23=.
If they hadn't done that, I would be able to find out for my own purposes, rather than checking on ads, IF user with phone 123123 has posted any ads on the site.
If you are looking to verify message integrity and authenticity with hashing - you might want to look at HMAC - there are plenty of implementations in PHP using both SHA1 and MD5:
http://en.wikipedia.org/wiki/HMAC
EDIT: In fact, PHP now has a function for this:
http://us3.php.net/manual/en/function.hash-hmac.php