Is there any algorithm or library (e.g. in PHP) that provides a "phone number exchange prevention"?
Basically e.g. a phone number like
0123 45 67 89
can easily be removed by regular expression.
But a number like
0
1
2
3
4
5
can harder be detected. And then even harder, the hardest case:
my number is: zero one two three four five six seven
How would you remove something like this via PHP regex? Is there a library?
Related
I need to identify a fraction from a form field in a recipe database using a Regex.
Ingredients will be entered in a two part form fields. Field one is the amount, Field two is the ingredient. I then need to break field one into its fractional components to input into the database.
Possible entries include:
1, 1/2, 1 1/2, and any of the previous with words attached such as 1 cup, or 1/2 tbsp.
the hardest I foresee would be: [2 28 oz. cans] where 2 is the number, and 28 oz. cans would be the word.
I have found:
(\b[0-9]{1,3}(?:,?[0-9]{3})*(?:.[0-9]{2})?\b)
which sort of works. I am completely new to Regex, so I am working on guess and check only, and I am having a hard time making it work for me.
Problem #1: I need to identify the word part as well. The word part can be multiple words as well, such as 2 large cans, where large cans would be the word part. The above Regex identifies the numbers very well, but I cant figure out a way to grab the rest of the form field. For example 1 1/2 tbsp gives me 1,1,2 but that is all, and I need tbsp as well. I tried to use this Regex and use len to cut the original down, subtracting the fraction off the front, but had problems since 1 / 2 and 1/2 are both allowed, so cant figure out how many spots to subtract (1 / 2 should subtract 6 from the front of the string, 1/2 should subtract 4 from the front of the string, and just looking at the regex results of 1,2 I cant tell howmany to subtract).
Problem #2: This isnt so important, but any ideas on how to identity the [2 28 oz cans] problem? The above Regex pulls 2,28 out which is not correct, it shoudl only pull 2 out and then the rest (28 oz cans) would be the other part that the solution to problem 1 will hopefully find.
Here's a regex that will match mixed numbers, whole numbers, and the rest of the entry (the ingredient, hopefully with any extraneous numbers):
^((\d+( \d+/\d+)?)|(\d+/\d+))( (.+))?$
So for example if had 2 28 ounce cans it would match:
group 1: 2
group 2: 2
group 3:
group 4:
group 5: 28 ounce cans
group 5: 28 ounce cans
The groups you care about are 1 & 5. Group 1 will always contain the amount (as a number, fraction, or number with a fraction) and group 6 will always have the remaining text (the ingredient).
Here is my regex to validate a phone number.
((^\(?(?:(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?\(?(?:0\)?[\s-]?\(?)?|0)(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}|\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4}|\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3})|\d{5}\)?[\s-]?\d{4,5}|8(?:00[\s-]?11[\s-]?11|45[\s-]?46[\s-]?4\d))(?:(?:[\s-]?(?:x|ext\.?\s?|\#)\d+)?)$)|(\(?[2-9][0-8][0-9]\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}))|(?:\((\+?\d+)?\)|(\+\d{0,3}))? ?\d{2,3}([-\.]?\d{2,3} ?){3,4}
Here is the link for regex check http://regex101.com/r/xO4aU4
it validates UK US numbers. lower bound of Range of the number is 7 and higher bound is not restricted.
can I restrict it so that if range of the number is if less then 7 or greater then 14 then it should not filter the number at all.
(\+44)?\s?\(?0?\d{1,5}\)?\s\d{1,7}\s{0,1}\d{0,6}(?:\s-\s|\s)\s{0,2}\d{0,6}|(\+44)?\s?\(?\d{1,5}\)?\s\d{1,7}\s{0,1}\d{0,4}\s{0,1}\d{0,4}|(\+44)?\s?\(\d{1,5}\)\s?\d{3,7}\s?\d{0,4}\s?\d{0,4}|\d{4,5}\s*\d{3,5}\s\d{3,4}
That is a regex I use for Uk phone numbers (landlines) <- it is used in screen scraping sites so it is probably a little more robust and matches edge cases (such as people who put +44(0)1772 99 33 66) - it is used couple with string length checks and doesn't account for extension numbers - but you should put extension numbers as seperate field anyway.
I have no idea about US numbers so sorry can't help there!
I need to generate serial numbers using PHP in the following format "ASDK3-JDAL9-24SFT-J5D8R-D4AL9". One requirement is the fact that I need to encode somehow a timestamp and an email address inside this serial number and then retrieve them when needed. Is there any easy way to do this?
EDIT
To be more specific the allowed numbers and letters in serial number need to be 0-9 and A-Z. To make it more generic I need to have 2 short strings for example encoded in that serial number. For example a date "04/03/2013" and one number "324" or email address if possible. The string don't need to be human readable in the serial number but I need to be able to retrieve them when needed.
Let's do some simple math using base32 encode. You have 36 characters but we'll assume 32 because we're doing a rough estimate.
Base 32 adds 60% overhead. If you want to store a date of 8 characters and a number of 3 characters you'll need at least: ( 8 + 3 ) * 1.6 = 18 characters for this data. Your key is 25 characters long so you'll have 7 / 1.6 = 4 characters left for some randomness. If your random keys have 64 characters you'll have 64^4 = 16 million possibilities.
PHP doesn't have a native base 32 function available but you can write one yourself, the outline is the same as base64 except you take 7 bits at a time instead of 8.
I need to identify a fraction from a form field in a recipe database using a Regex.
Ingredients will be entered in a two part form fields. Field one is the amount, Field two is the ingredient. I then need to break field one into its fractional components to input into the database.
Possible entries include:
1, 1/2, 1 1/2, and any of the previous with words attached such as 1 cup, or 1/2 tbsp.
the hardest I foresee would be: [2 28 oz. cans] where 2 is the number, and 28 oz. cans would be the word.
I have found:
(\b[0-9]{1,3}(?:,?[0-9]{3})*(?:.[0-9]{2})?\b)
which sort of works. I am completely new to Regex, so I am working on guess and check only, and I am having a hard time making it work for me.
Problem #1: I need to identify the word part as well. The word part can be multiple words as well, such as 2 large cans, where large cans would be the word part. The above Regex identifies the numbers very well, but I cant figure out a way to grab the rest of the form field. For example 1 1/2 tbsp gives me 1,1,2 but that is all, and I need tbsp as well. I tried to use this Regex and use len to cut the original down, subtracting the fraction off the front, but had problems since 1 / 2 and 1/2 are both allowed, so cant figure out how many spots to subtract (1 / 2 should subtract 6 from the front of the string, 1/2 should subtract 4 from the front of the string, and just looking at the regex results of 1,2 I cant tell howmany to subtract).
Problem #2: This isnt so important, but any ideas on how to identity the [2 28 oz cans] problem? The above Regex pulls 2,28 out which is not correct, it shoudl only pull 2 out and then the rest (28 oz cans) would be the other part that the solution to problem 1 will hopefully find.
Here's a regex that will match mixed numbers, whole numbers, and the rest of the entry (the ingredient, hopefully with any extraneous numbers):
^((\d+( \d+/\d+)?)|(\d+/\d+))( (.+))?$
So for example if had 2 28 ounce cans it would match:
group 1: 2
group 2: 2
group 3:
group 4:
group 5: 28 ounce cans
group 5: 28 ounce cans
The groups you care about are 1 & 5. Group 1 will always contain the amount (as a number, fraction, or number with a fraction) and group 6 will always have the remaining text (the ingredient).
I have a web application, written in PHP that incorporates Javascript and JQuery, that will be used as my company's Inventory Management System (IMS). What I would like to be able to create is a Regex expression based upon user input of a value.
The idea behind this is that most manufacturers' serial numbers schema, length of characters and mixture of alpha to numeric values, is unique to a certain part. So when a part is added to the IMS and the first serial number is scanned into the system I would like a Regex statement to be built and saved to a database table corresponding to that part type. Any future times that a serial number is scanned the part should be auto-selected as the part type as it matches the serial number schema for that manufacturer. I understand this methodology may not always hold true to a single part so I could even return a list of parts that match the schema instead of the user needing to look it up in the catalog.
The basis of my question is what is the best starting point to look at having a function in code be able to decipher a value given by a user to create a Regex expression? I'm not requesting a full function but a starting point of how to look at my situation and goal so I can understand where to begin. I've scratched my head long enough and starting writing functions numerous times just to delete the entire block knowing I was headed for disaster.
Anything in code is possible - is this feasible?
EDIT - ADDED SAMPLE VALUES
DVD-RW (Optical Drives)
1613518L121
1613509L121
1613519L121
VGA Output Cards
0324311071068
0324311071134
COM Expansion Cards
608131234
608131237
Hard Drives
WMAYUJ753738
WMAYUJ072099
WMAYUJ683739
WMAYUJ844900
As you can see some values are going to be numeric only of a certain length of characters. Others will have alpha characters at the beginning followed by a series of numbers. Others may have alpha/numeric characters interspersed with each other. In most every single case a simple length of alpha/numeric rule is going to fit for identifying a singular part type in our list of goods. However, in those cases that more than one expression matches a value, I can simply have the application show a list of two or more products that match the regex and prompt the user to select the proper part. This, overall, will save time and mistakes in selecting a product type in the WMS database.
Thanks for the comments. I understand I'm not asking a question that has one answer to it. I'm looking for a starting point on how to best step through the string and spit out a corresponding Regex statement that would match the value.
As #Pete says, I think you have set yourself too ambitious a goal. Some thoughts, perhaps overly generalized from your specific needs.
I take it that you want to scan a serial number like 1-56592-487-8 and infer that the regular expression /\d-\d{5}-\d{3}-\d/ matches parts of this type from a given manufacturer. (This happens to be the ISBN-10 for my copy of "Java in a Nutshell." ISBNs are not serial numbers, but work with me.) But you can't infer from a handful of examples what pattern the manufacturer uses. Maybe the first character position is a hex digit (0-F). Maybe the last character is a checksum that can be a digit or X (like ISBNs). Maybe there is a suffix, not always present, that denotes the plant. So you will find yourself building up many patterns for the same manufacturer/part type as new instances of the part come in.
You will also have the reverse problem. A maker of widgets uses the regex /[A-Z]{3}\d{7}/, and a maker of sonic screwdrivers uses the same pattern.
That said, about the best you can do is something this:
for each character in the scanned serial number
if it is a capital letter
add [A-Z] to the regular expression
else if it is a digit
add \d to the regular expression
else
add the character itself to the regular expression, escaped as necessary
end for
collapse multiple occurrences with the {,} interval qualifier
The rules for Vehicle Identification Numbers may also be inspiring. Think about how you would infer the rules for VINs, given a handful of examples.
EDIT: sorry, my sample code is buggy you need this kind of algorithms as first step on the parts that you will guess: longest substring or this
you will need to add iteratives and some masking like explained above and by David, also on the sample below, the "L121" for DVD-RW is not guessed (as i have stated that i must be starting with 'common'). So you will need to find all the common consecutive subsequences and decide which one are relevant! (probably with a kind of maximization gain function )
using the second link long_substr :
>>> for x in d:
for y in d:
if x == y: continue
common = long_substr([x, y])
length = len(common)
if x.startswith(common) and y.startswith(common):
print "\t".join((x, y, str(length), common))
that produce =>
0324311071068 0324311071134 10 0324311071
0324311071134 0324311071068 10 0324311071
1613519L121 1613518L121 6 161351
1613519L121 1613509L121 5 16135
WMAYUJ844900 WMAYUJ753738 6 WMAYUJ
WMAYUJ844900 WMAYUJ072099 6 WMAYUJ
WMAYUJ844900 WMAYUJ683739 6 WMAYUJ
WMAYUJ753738 WMAYUJ844900 6 WMAYUJ
WMAYUJ753738 WMAYUJ072099 6 WMAYUJ
WMAYUJ753738 WMAYUJ683739 6 WMAYUJ
1613518L121 1613519L121 6 161351
1613518L121 1613509L121 5 16135
WMAYUJ072099 WMAYUJ844900 6 WMAYUJ
WMAYUJ072099 WMAYUJ753738 6 WMAYUJ
WMAYUJ072099 WMAYUJ683739 6 WMAYUJ
WMAYUJ683739 WMAYUJ844900 6 WMAYUJ
WMAYUJ683739 WMAYUJ753738 6 WMAYUJ
WMAYUJ683739 WMAYUJ072099 6 WMAYUJ
608131237 608131234 8 60813123
1613509L121 1613519L121 5 16135
1613509L121 1613518L121 5 16135
608131234 608131237 8 60813123
--- first buggy reply start here
below is the first part of my reply, that could only help you to understand where i was wrong and may be give you some ideas :
a sample using the Longest Common Subsequence probleme solver LCS with your particular need, that i can think of being a first step of a process of guessing what will be common ?
it is in Python, but for the demo part, it can be easily readable (or can be cut and paste in IDLE (the python editor)) assumong that you use the ActiveState Code Recipes of the first link above
this has to do with bio informatics (think of genes alignment)
you will need something to decide what is the most interesting common sequence (may be having a minimal length? and then proceed with masking like already proposed by David or in my comment
(at first i do not see that the LCS what not a LCS consecutive solver, while you will need it to be! SO my first usage of the LCS solver is buggy :( as it is not contiguous, i have MAYUJ8 or WMAYUJ7 and not WMAYUJ - which is shorter ! while solver find longest common characters without expecting them to be consectuive! - again sorry for that)
>>> raw = """1613518L121
1613509L121
1613519L121
0324311071068
0324311071134
608131234
608131237
WMAYUJ753738
WMAYUJ072099
WMAYUJ683739
WMAYUJ844900"""
>>> d = dict()
>>> for line in raw.split("\n"):
if not line.strip(): continue
value = line.strip()
d[value] = 1
>>> for x in d:
for y in d:
if x == y: continue
length = LCSLength(x, y)
common = LCS(x,y)
if length >= 3 and x.startswith(common):
print "\t".join((x, y, str(length), common))
that produce =>
0324311071068 0324311071134 10 0324311071
0324311071068 608131234 4 0324
0324311071134 0324311071068 10 0324311071
WMAYUJ844900 WMAYUJ753738 7 WMAYUJ8
WMAYUJ753738 WMAYUJ072099 7 WMAYUJ7
608131237 608131234 8 60813123
608131234 608131237 8 60813123
Run spam detecting algorithms (statistical one like bayes or similar "learning" ones). This will or won't help you, but if not, I honestly doubt you will ever make any useful logical algorithm here.