First,
Is it possible for a sha1 hash to be all numbers or letters,
And second is there any need for the start and end delimiters when using a regex to check for a sha 1 hash, ie,
/^[0-9a-f]{40}$/i
inplace of
/[0-9a-f]{40}/i
Is there any need to use the delimeters?
I ask as should I check if the pattern has at least one number and at least one letter, or does this not matter?
A sha1 hash is a 160 bit value that can be between all 0s and all 1s. This means that yes, in theory it can be all numbers or all letters (more specifically, the hex representation of it can be).
As for the beginning and ending markers, they are required unless you check the string in other ways. The two patterns you posted are not equivalent:
/^[0-9a-f]{40}$/i
A string that consists of and only of 40 character in 0-9 or a-f.
/[0-9a-f]{40}/i
A string that contains 40 character in 0-9 or a-f in a row.
In other words, the first pattern would consider this invalid whereas the second would not:
|0000000000000000000000000000000000000000|
The second pattern would match the 40 valid characters in the middle and not care about the rest of it.
You could effectively turn the second pattern into the first if you also used strlen to verify that the string is exactly 40 characters. This would be a bit redundant though, as you'd essentially then have a pattern of:
A string that: (contains 40 characters in 0-9 or a-f in a row) and (is exactly 40 characters).
The first version expresses it more compactly, though the second is a bit more obvious.
Related
I'm looking for a way to convert an alphanumeric string, e.g. "aBcd3f", into a purely numeric representation, and get the shortest possible input string. The valid characters in the input string are a-z, A-Z, 0-9, and the resultant string would be comprised only of digits 0-9.
Since there are 62 valid values for each character in the input string, I can assign values 00-61 to each input character, and covert the 6 input characters into a 12 character numeric string.
But I would like to get something more compact, if possible - e.g. 8-10 digits. Is it possible, and if so, are there any algorithms or functions for doing this in PHP?
Note that this has to be a 2-way function. I also need to be able to go back from the numeric string to the alphanumeric.
I haven't found this question asked on this site. My question is the opposite of this question, as I'm trying to go in the opposite direction.
A decimal digit encodes log2(10) = 3.32 bits of information on average. Alphanumeric data has 62 possible "digits", so each one encodes log2(62) = 5.95 bits of information on average.
This means that converting from alphanumeric to decimal digits only will require approximately 5.95 / 3.32 = 1.79 times more characters in the output than there are in the input. If your output is constrained to 10 characters maximum you can expect it to encode at most 5.58 characters of alphanumeric input, which for practical purposes means just 5. There is no room for maneuvering here; this is cold math.
The manner of converting from one representation to the other is fairly straightforward, because in essence you are simply converting a number from base 62 to base 10 and back. You can tweak the code from this answer of mine only slightly to achieve the aim.
See it in action.
Note that with the (arbitrary) order of digits I picked the "largest" possible input with 5 characters is "ZZZZZ", which encodes to 9 decimal digits. If you expand the input to 6 characters the largest input would be "ZZZZZZ" which would need 11 decimal digits to encode -- more than the limit we imposed, as predicted.
Also note that this analysis assumes every possible input string is as likely to occur as any other, i.e. the input is perfectly random. If this is not the case then the actual information content of the input would be lower than the theoretical maximum and consequently you could take advantage of this with some kind of compression scheme.
I am using some data which gives paths for google maps either as a path or a set of two latitudes and longitudes. I have stored both values as a BLOB in a mySql database, but I need to detect the values which are not paths when they come out in the result. In an attempt to do this, I have saved them in the BLOB in the following format:
array(lat,lng+lat,lng)
I am using preg_match to find these results, but i havent managed to get any to work. Here are the regex codes I have tried:
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)\+(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)[\)]{1}^
Regex confuses me sometimes (as it is doing now). Can anyone help me out?
Edit:
The lat can be 2 digits followed by a decimal point and 8 more digits and the lng can be 3 digits can be 3 digits follwed by a decimal point and 8 more digits. Both can be positive or negative.
Here are some example lat lngs:
51.51160000,-0.12766000
-53.36442000,132.27519000
51.50628000,0.12699000
-51.50628000,-0.12699000
So a full match would look like:
array(51.51160000,-0.12766000+-53.36442000,132.27519000)
Further Edit
I am using the preg_match() php function to match the regex.
Here are some pointers for writing regex:
If you have a single possibility for a character, for example, the a in array, you can indeed write it as [a]; however, you can also write it as just a.
If you are looking to match exactly one of something, you can indeed write it as a{1}, however, you can also write it as just a.
Applying this lots, your example of ^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^ reduces to ^array\([1-9\.\,\+]{1*}\)^ - that's certainly an improvement!
Next, numbers may also include 0's, as well as 1-9. In fact, \d - any digit - is usually used instead of 1-9.
You are using ^ as the delimiter - usually that is /; I didn't recognize it at first. I'm not sure what you can use for the delimiter, so, just in case, I'll change it to the usual /.This makes the above regex /array\([\d\.\,\+]{1*}\)/.
To match one or more of a character or character set, use +, rather than {1*}. This makes your query /array\([\d\.\,\+]+\)/
Then, to collect the resulting numbers (assuming you want only the part between the brackets, put it in (non-escaped) brackets, thus: /array\(([\d\.\,\+]+)\)/ - you would then need to split them, first by +, then by ,. Alternatively, if there are exactly two lat,lng pairs, you might want: /array\(([\d\.]+),([\d\.]+)\+([\d\.]+),([\d\.]+)\)/ - this will return 4 values, one for each number; the additional stuff (+, ,) will already be removed, because it is not in (unescaped) brackets ().
Edit: If you want negative lats and longs (and why wouldn't you?) you will need \-? (a "literal -", rather than part of a range) in the appropriate places; the ? makes it optional (i.e. 0 or 1 dashes). For example, /array\((\-?[\d\.]+),(\-?[\d\.]+)\+(\-?[\d\.]+),(\-?[\d\.]+)\)/
You might also want to check out http://regexpal.com - you can put in a regex and a set of strings, and it will highlight what matches/doesn't match. You will need to exclude the delimiter / or ^.
Note that this is a little fast and loose; it would also match array(5,0+0,1...........). You can nail it down a little more, for example, by using (\-?\d*\.\d+)\) instead of (\-?[\d\.]+)\) for the numbers; that will match (0 or 1 literal -) followed by (0 or more digits) followed by (exactly one literal dot) followed by (1 or more digits).
This is the regex I made:
array\((-*\d+\.\d+),(-*\d+\.\d+)\+(-*\d+\.\d+),(-*\d+\.\d+)\)
This also breaks the four numbers into groups so you can get the individual numbers.
You will note the repeated pattern of
(-*\d+\.\d+)
Explanation:
-* means 0 or more matches of the - sign ( so - sign is optional)
\d+ means 1 or more matches of a number
\. means a literal period (decimal)
\d+ means 1 or more matches of a number
The whole thing is wrapped in brackets to make it a captured group.
Updating someone else's old PHP project and I'm unfamiliar with regular expressions.
Question one is: What does this do?
preg_match('/^[0-9]+[.]?[0-9]*$/', $variable)
Question two is: Is this a safe filter for insertion into a mysql DB without mysql_real_escape_string()? I know the answer is prob no, but it is set up to use mysql_real_escape_string() only if this regex doesn't pass.
Thanks.
^ // start of string
[0-9]+ // one or more numbers (could also be \d+)
[.]? // zero or one period (could also be \.?)
[0-9]* // zero or more numbers (could also be \d*)
$ //end of string
So, it makes sure the input is a number, such as 12 or 3.6 (52. will also match). It will not match .35 or 12a6.
It seems safe enough for DB insertion, because it only allows numbers.
it matches strings that:
start with at least 1 digit from 0-9
have a decimal point after the first n digits 0 or 1 time
have any digit after a char 0 or more times
It does not sanitise string for database.
It checks if $variable matches this pattern...
starts with one or more digits (^[0-9]+)
followed by optional . ([.]?)
followed by as many or as few digits as you like ([0-9]*)
followed by the end of the string ($)
It's attempting to match a decimal number (albeit poorly). It doesn't modify $variable anyway, so you would need to escape it properly before passing to MySQL.
That will match a number that has at least one digit before the decimal point (if there is a decimal point). If the value matches this regex, I don't see how it could be unsafe to insert it into the database.
looks if the a exact match.
it matches
234234232432343.231313132321
and
2232233223
and
322332.
and not
.32232
and not
Is this a safe filter for insertion into a mysql DB without mysql_real_escape_string()?
Assuming the possible use of this variable, I'd say that mysql_real_escape_string() would be quite useless for it.
Need the query assembling code to be certain though.
My regex at the minute is like this
'/[a-z0-9]{40}/i'
Which will match any string with no spaces that contains letters and/or numbers.
How can I change it so that it must at least include at least one number and at least one alphabet character so that if the string was all numbers or all letters it would not be matched?
Thanks
/([:alpha:].*[:digit:]|[:digit:].*[:alpha:])/
This requires a number to follow a letter, or a letter to follow a number.
From your original regex, it appears that you want to enforce a requirement for 40 characters total. For that, try:
/^(.*[:alpha:].*[:digit:].*|.*[:digit:].*[:alpha:].*){40}$/
Note the extra .*'s. As long as there's one alpha and one digit, the other characters can be anything. As long as there are 40 of them.
If you want to avoid matching whitespace, replace each .* with [^[:space:]]*.
I need to validate a username in PHP. It can be:
Letters (upper and lower case)
Numbers
Any of these symbols :.,?!#
Up to 15 characters OR 16 if the last character is one of the following #$^ (it can also be 15 or less with one of these 3 characters at the end only)
How do I do this?
Start with this:
/^[a-zA-Z0-9:.,?!#]{3,15}[#$^]?$/
then refine it to your needs. Try to see if you need escaping of the special char, but you should get the idea.
This means: from a to z, from A to Z, from 0 to 9 and :.,?!# repeated from 3 to 15 times, optionally followed by one among #$^