Regular expression for street address

Regular expression for street address - php

I am trying to match street addresses containing the street and number.
I need the expression to match words for the street name, followed by the number.
For example I want to match "somestreet 25", "some other street 23","a-third street 190", but not "a_fourth street 67".
I am trying with it for an hour but I am not even close to good with regex's.
So far all I've got is /^[a-zA-Z]+([\s][a-zA-Z]+)([\s][0-9]){1,4}$/ but needless to say, it is not working.
--- EDIT ---
I understand that there is no standard, global way of writing the street address, and that regular expressions can't really be complicated enough to cover the problem on a global scope, but the site is for a local restaurant, and all I want is the address to look like it could be an address (even then, without map and telephone verification it could still be a fake one).
There will, however, be human verification at all times before anything is sent, and also it is a rather small neighborhood, so both the delivery person and the restaurant owner know if the order is fake or not.
All I want is to keep them from getting spammed with silly !##$ characters in the address, and have a decent readable address formatting for them to work with.

This should work on your examples:
/^[a-zA-Z]([a-zA-Z-]+\s)+\d{1,4}$/

You've overcomplicated it a little bit. This is a case-insensitive expression that looks for letters with hyphens and spaces, followed by numbers, matching your stated criteria.
/^([a-z- ]+)\s+([0-9]+)$/i
But what about me? I live on 30th Ave.
By the way, I used [0-9]+ for one or more numbers at the end, instead of your {1,4} range. If you must not have more than 4, then switch it back to your range {1,4}.

This will do
/^([A-Z][-A-Z ]+)\s+(\d+)$/i
I think street names have no regular formation. So Regular Expression is not applicable for this

Related

PHP regex street address

I need to write a regex statement for street addresses. It's for a class so it doesn't have to be too fancy. Basically, I want it to accept a) a group of numbers, b) a space, c) a street name, either starting with a letter or number, and d) anything after that.
So far, this is what I have:
^\d+\s[0-9a-zA-Z]*
I'm using the example 123 Sesame Street. It accepts 123 Sesame, but doesn't match Street, or, in other words, d) anything after that.
Thanks!

You forgot a space in street character list.
^\d+\s[0-9a-zA-Z ]* // or \s instead of space
^

Agreed you won’t learn anything by having someone give you the answer.. this guide is really fantastic:
https://www.princeton.edu/\~mlovett/reference/Regular-Expressions.pdf
But, With that said.. in its simplest form to accomplish your goal:
^[0-9]+\s.*$
But what if you lived at 147B N. Henderson Way? This wouldn’t match your criteria.

extracting phone numbers

I'm trying to exract phone numbers from a set of data. It has to be able to extract international and local numbers from each country.
The rules I've laid out for it are:
1. Look for the international symbol, indicating it's an international dialing number with a valid extension(from +1 to +999).
2. If the plus symbol is present, make sure the next following character is a number.
3. If there is none, look at the length to validate it is between 7 and 10 digits long.
4. In the event that the number is divided (correctly via international standers) by either a hyphen(-) or space make sure the amount of digits in between them are either 3 or 4
What I've got so var is:
\+(?=[1-999])(\d{4}[0-9][-\s]\d{3}[0-9][-\s]\d{4}[0-9])|(\d{7,11}[0-9])
That's for international, and the local search is\d{7,10}
The thing is, that it doesn't actually pick up numbers with spaces or hyphens in it.
Can anybody give me some advice on it?

\d already means "digit", so you shouldn't put another [0-9] after it (which means the same).
In the same vein, [1-999] doesn't mean what you think it does. It in fact matches one (1) digit between 1 and 9. You probably want \d{1,3} although that would also match 0.
Then, you're only allowing one variation of dividing blocks (4-3-4) - why? This is not going to match many, many valid phone numbers.
I would suggest the following:
Search your string using the regex \+?(?=\d)[\d\s-]{7,13}\b to grab anything that remotely looks like a phone number. Perhaps you also want to include parentheses and slashes in the allowed character list: \+?(?=\d)[\d\s/()-]{7,14}\b
Then process and validate those strings separately, best after removing all punctuation/whitespace (except the +).

I'm not sure it will be possible to create a regex to match every country - some countries have conflicting rules.
it's entirely possible to have e.g. two valid local numbers contained within 1 valid international number.
You might want to start by looking at some of the answers to this question:
A comprehensive regex for phone number validation
If you're looking to create something definitive for every country, good luck, and you'll probably need to spend a while with some technical standards...
i.e. both 177 and 186-0039-011-81-90-1177-1177 are valid phone numbers in the same country

how to detect telephone numbers in a text (and replace them)?

I know it can be done for bad words (checking an array of preset words) but how to detect telephone numbers in a long text?
I'm building a website in PHP for a client who needs to avoid people using the description field to put their mobile phone numbers..(see craigslist etc..)
beside he's going to need some moderation but i was wondering if there is a way to block at least the obvious like nnn-nnn-nnnn, not asking to block other weird way of writing like HeiGHT*/four*/nine etc...

Welcome to the world of regular expressions. You're basically going to want to use preg_replace to look for (some pattern) and replace with a string.
Here's something to start you off:
$text = preg_replace('/\+?[0-9][0-9()\-\s+]{4,20}[0-9]/', '[blocked]', $text);
this looks for:
a plus symbol (optional), followed by a number, followed by between 4-20 numbers, brackets, dashes or spaces, followed by a number
and replaces with the string [blocked].
This catches all the obvious combinations I can think of:
012345 123123
+44 1234 123123
+44(0)123 123123
0123456789
Placename 123456 (although this one will leave 'Placename')
however it will also strip out any succession of 6+ numbers, which might not be desirable!

To do so you must use regular expressions as you may know.
I found this pattern that could be useful for your project:
<?php
preg_match("/(^(([\+]\d{1,3})?[ \.-]?[\(]?\d{3}[\)]?)?[ \.-]?\d{3}[ \.-]?\d{4}$)/", $yourText, $matches);
//matches variable will contain the array of matched strings
?>
More information about this pattern can be found here http://gskinner.com/RegExr/?2rirv where you can even test it online. It's a great tool to test regular expressions.

preg_match($pattern, $subject) will return 1 (true) if pattern is found in subject, and 0 (false) otherwise.
A pattern to match the example you give might be '/\d{3}-\d{3}\d{4}/'
However whatever you choose for your pattern will suffer from both false positives and false negatives.
You might also consider looking for words like mob, cell or tel next to the number.
The fill details of the php pattern matching can be found at http://www.php.net/manual/en/reference.pcre.pattern.syntax.php
Ian
p.s. It can't be done for bad words, as the people in Scunthorpe will tell you.

I think that use a too tight regular espression would lead to loose a great number of detections.
You should check for portions of 10 consecutive chatacters containing more than 5 digits.
So it is similar you will have an analisys routine queued to be called after any message insertion due to the computational weight.
After the 6 or more digits have been isolated replace them as you prefer, including other syblings digits.
Better in any case to preserve original data, so you can try and train your detection algorithm until it works the best way.
Then you can also study your user data to create more complex euristics, such like case insensitive numbers written as letters, mixed, dot separated, etc...
It's not about write the most perfect regex, is about approaching the problem statistically and dinamically.
And remember, after you take action, user will change their insertion habits as consequence, so stats will change and you will need to learn and update your euristics.

PHP and Regular Expressions question?

I was wondering if the codes below are the correct way to check for a street address, email address, password, city and url using preg_match using regular expressions?
And if not how should I fix the preg_match code?
preg_match ('/^[A-Z0-9 \'.-]{1,255}$/i', $trimmed['address']) //street address
preg_match ('/^[\w.-]+#[\w.-]+\.[A-Za-z]{2,6}$/', $trimmed['email'] //email address
preg_match ('/^\w{4,20}$/', $trimmed['password']) //password
preg_match ('/^[A-Z \'.-]{1,255}$/i', $trimmed['city']) //city
preg_match("/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i", $trimmed['url']) //url

Your street address: ^[A-Z0-9 \'.-]{1,255}$
you need not escape the single quote.
since you have a dot in the char
class, it will allow all char (except
newline). So effective your regex becomes ^.{1,255}$
you are allowing it to be of min
length of 1 and max of length 255. I
would suggest you to increase the min
length to something more than 1.
Your email regex: ^[\w.-]+#[\w.-]+\.[A-Za-z]{2,6}$
again you are having . in the char
class. fix that.
Your password regex: ^\w{4,20}$
allows for a passwd of length 4 to 20
and can contain only alphabets(upper
and lower), digits and underscore. I would suggest you to allow
special char too..to make your
password stronger.
Your city regex: ^[A-Z \'.-]{1,255}$
has . in char class
allows min length of 1 (if you want
to allow cities of 1 char length this
is fine).
EDIT:
Since you are very new to regex, spend some time on Regular-Expressions.info

This seems overly complicated to me. In particular I can see a few things that won't work:
Your regex will fail for cities with non-ASCII letters in their names, such as "Malmö" or 서울, etc.
Your password validator doesn't allow for spaces in the password (which is useful for entering pass-phrases) it doesn't even allow digits or punctuation, which many people will like to put in their passwords for added security.
You address validator won't allow for people who live in apartments (12/345 Foo St)
(this is assuming you meant "\." instead of "." since "." matches anything)
And so on. In general, I think over-reliance on regular expressions for validation is not a good thing. You're probably better off allowing anything for those fields and just validating them some other way.
For example, with email addresses: just because an address is valid according to the RFC standard doesn't mean you'll actually be able to send email to it (or that it's the correct email address for the person). The only reliable way to validate an email address is to actually send an email to it and get the person to click on a link or something.
Same thing with URLs: just because it's valid according to the standard doesn't actually mean there's a web page there. You can validate the URL by trying to do an actual request to fetch the page.
But my personal preference would be to just do the absolute minimum verification possible, and leave it at that. Let people edit their profile (or whatever it is you're verifying) in case they make a mistake.

There's not really a 'correct' way to check for any of those things. It depends on what exactly your requirements are.
For e-mail addresses and URLs, I'd recommend using filter_var instead of regexps - just pass it FILTER_VALIDATE_EMAIL or FILTER_VALIDATE_URL.
With the other regexps, you need to make sure you escape . inside character classes (otherwise it'll allow everything), and you might want to consider that the City/Street ones would allow rubbish such as ''''', or just whitespace.

Please don't assume that you know how an address is made up. There are thousands of cities, towns and villages with characters like & and those from other alphabets.
Just DON'T try to validate an address unless you do it thru an API specific to a country (USPS for the US, for example).
And why would you want to limit the characters in a users password? Don't have ANY requirements on the password except for it existing.
Your site will be unusable if you use those regex.

PHP: How to validate a phone number if well formed?

Using PHP, how can I verify if a phone # is well formed?
It seems easiest to simply strip all non-numeric data, leaving only the numbers. Then to check if 10 digits exist.
Is this the best and easiest way?

The best? No. Issues I see with this approach:
Some area codes - like 000-###-#### - are not valid. See http://en.wikipedia.org/wiki/List_of_NANP_area_codes
Some exchanges - like ###-555-#### - are not valid. See http://en.wikipedia.org/wiki/555_%28telephone_number%29
Some people will enter a 1 before their number, i.e. 1-###-###-####.
Some people are only reachable at an extension, like ###-###-#### x####.
Some companies tack on extra digits, like 1-800-GO-FLOWERS. The additional digits are simply ignored by the phone system, but a user might expect to be able to enter the whole thing.
International phone numbers are not necessarily 10 digits, even if you discount the country codes.
Good enough? Quite possibly, but that's up to you and your app.

You can use a regex for it:
$pattern_phone = "|^[0-9\+][0-9\s+\-]*$|i";
if(!preg_match($pattern_phone,$phone)){
//Somethings wrong
}
Haven't tested the regex, so it may not be 100% correct.

Checking for 10 digits after stripping will check the syntax but won't check the validity. For that you'd need to determine what valid numbers are available in the region/country and probably write a regex to match the patterns.

The problem with validating/filtering data like this usually comes down the the answer to this question: "How strict do I want to be?" which then devolves into a series of "feature" questions
Are you going to accept international numbers?
Are you going to accept extensions?
Are you going to allow various formats i.e., (111) 222-3333 vs 111.222.3333
Depending on your business rules, the answers to these questions can vary. But to be the most flexible, I recommend 3 fields to take a phone number
Country Code (optional)
Phone Number
Extension (optional)
All 3 fields can be programmatically limited/filters for numeric values only. You can then combine them before storing into some parse-able format, or store each value individually.

Answering if something is "the best" thing to do, is nearly impossible (unless you're the one answering your own question).
The way you propose it, stripping all non-digits and then check if there are 10 digits, might result in unwanted behaviour for a string like:
George Washington (February 22, 1732 –
December 14, '99) was the commander
of the Continental Army in the
American Revolutionary War and served
as the first President of the United
States of America.
since stripping all non-digits will result in the string 2217321499 which is 10 fdigits long, but I highly doubt that the entire string should be considered as a valid phone number.

What format you need? You can use regular expressions to this.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.