EDIT: I'm doing this because the data I've been provided has hundreds of newline-separated entries in this format, and I need to incorporate microformats into the address data. Thus if the string provided is as below, I need to output:
<p><span itemprop="telephone">+1 800 123 456</span> (toll free) from overseas</p>
--
I need a regex to extract a phone number from the format below:
+1 800 123 456 (toll free) from overseas
The data I have been provided has consistently been entered in this format, so effectively, a regex to get everything from and including a "+" up to the first non-numerical character.
If you want to use a regex you can use something like this:
\+[\d\s]+
$re = '/(\+[\d\s]+)/';
$str = "+1 800 123 456 (toll free) from overseas\n";
preg_match_all($re, $str, $matches);
On the other hand, you can use what castis suggested, to use a preg_replace to replace the characters you don't want by empty string and keep the rest, like:
preg_replace('/[\D\+]/', '', $phone_number);
Related
I'm trying to parse all numbers from a text:
text 2030 text 2,5 text 2.000.000 2,000,000 -200 +31600000000. 200. 2.5 200? 1:200
Based on this regex:
(?<!\S)(\-?|\+?)(\d*\.?\,?\d+|\d{1,3}(,?.?\d{3})*(\.\,\d+)?)(?!\S)
But endings like ., ?, !, , right after the number doesn't match. I only want full matches with preg_match_all. (see image)
I guess that the problem is in the last part of my regex (?!\S). I tried different things but I can't figured it out how to solve this.
If we don't wish to validate our numbers, maybe we could then start with a simple expression, maybe something similar to:
(?:^|\s)([+-]?[\d:.,]*\d)\b
Test
$re = '/(?:^|\s)([+-]?[\d:.,]*\d)\b/s';
$str = 'text 2030 text 2,5 text 2.000.000 2,000,000 -200 +31600000000. 200. 2.5 200? 1:200
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
In the right panel of this demo, the expression is further explained, if you might be interested.
EDITS:
Another expression would be:
(?:^|\s)\K[+-]?(?:\d+:\d+|\d+(?:[.,]\d{1,3})+|\d+)\b
which would not still validate our numbers and just collect those listed, with some invalid numbers.
DEMO 2
DEMO 3
i am using php and wants to extract phone/mobile numbers from string, i have string with multiple format of phone numbers like
$str = '(123) 456-7890 or (123)456-7890 and 1234567890 test "123.456.7890" another test "123 456 7890"';
i had write one RE as,
$phoneMatches = '';
$str = '(123) 456-7890 or (123)456-7890 or 1234567890 or "123.456.7890" or "123 456 7890"';
$phonePattern = '/\b[0-9]{3}\s*[-]?\s*[0-9]{3}\s*[-]?\s*[0-9]{4}\b/';
preg_match_all($phonePattern, $str, $phoneMatches);
echo "<pre>";
print_r($phoneMatches);
exit;
but it gives me output like this,
Array
(
[0] => Array
(
[0] => 1234567890
[1] => 123 456 7890
)
)
Means only two, but i want all the possible combination of phone numbers and mobile numbers from string of text by using only ONE Regular expression.
Thanks
I know I'm late, and I'm not sure if this is what you wanted, but I came up with this solution:
[+()\d].*?\d{4}(?!\d)
Demonstration: regex101.com
Explanation:
[+()\d] - We start by matching anything that might represent the start of a phone number.
.*?\d{4} - Then we match anything (using a lazy quantifier) until we reach four ending digits. Just a little note: I considered this as a rule, but it might not always apply. You'd then need to modify the regex to include other cases.
(?!\d) - This is a negative lookahead and it means that we don't want any matches followed by a digit character. I used this to avoid some half-matches.
Another observation is that this regex doesn't validate any phone number. You could have anything in between the matches, mainly because of this part: .*?\d{4}. This will work depending on what kind of situation you intend to use it.
I am trying to grab a number that can be in the format $5,000.23 as well as say, $22.43 or $3,000
Here's my regular expression, this is in PHP.
preg_match('/\$([0-9]+)([\.,]*)?([0-9]*)?([\.])?([0-9]*)?/', $blah, $blah2);
It seems to match numbers in the format $5,500.23 perfectly fine, however it doesn't seem to match any other numbers well, like $0.
How do I make everything optional? Shouldn't grouping () and using a question mark do that?
This should do the trick:
\$[\d,.]*[\d]
Debuggex Demo
Specific PHP Example:
$re = "/\\$[\\d,.]*[\\d]/";
$str = "\$1 klsjdfgsjdfg \$100 kjdfhglsjdfg \$1,000 jljsdfg \$1,000.00 ldfjhsdf";
preg_match_all($re, $str, $matches);
Regex 101 Demo
I have some alerts setup, that are emailed to me on a regular occurrence and in those emails I get content that looks like this:
2002 Volkswagen Eurovan Clean title - $2000
That is the general consistent format. Those are also links that are clickable.
I have a script that's setup already that will extract the links from the body string properly, but what I am looking for is basically the year and the price from those titles that come in. There is the possibility of more than one being listed within the email.
So my question is, how can I use preg_match_all to properly grab all the possibilities so that I can then explode them to get the first piece of data (year) and the last piece of data (price)? Would I take the approach to see if I can match based on digits as it's presumed the format will generally be the same?
You can try matching the 4 digits starting with 19 and 20 and name these captures a year, and the digits after $ a price, and use anchors ^ and $ if these values are always at the beginning and end of a string:
^(?'year'\b(?:19|20)\d{2}\b)|(?'price'\$\d+)$
See demo
Sample IDEONE code:
$re = "/^(?'year'\\b(?:19|20)\\d{2}\\b)|(?'price'\\$\\d+)$/";
$str = "2002 Volkswagen Eurovan Clean title - \$2100";
preg_match_all($re, $str, $matches);
print_r(array_filter($matches["year"]));
print_r(array_filter($matches["price"]));
Output:
Array
(
[0] => 2002
)
Array
(
[1] => $2100
)
Currently I am developing a web application to fetch Twitter stream and trying to create a natural language processing by my own.
Since my data is from Twitter (limited by 140 characters) there are many words shortened, or on this case, omitted space.
For example:
"Hi, my name is Bob. I m 19yo and 170cm tall"
Should be tokenized to:
- hi
- my
- name
- bob
- i
- 19
- yo
- 170
- cm
- tall
Notice that 19 and yo in 19yo have no space between them. I use it mostly for extracting numbers with their units.
Simply, what I need is a way to 'explode' each tokens that has number in it by chunk of numbers or letters without delimiter.
'123abc' will be ['123', 'abc']
'abc123' will be ['abc', '123']
'abc123xyz' will be ['abc', '123', 'xyz']
and so on.
What is the best way to achieve it in PHP?
I found something close to it, but it's C# and spesifically for day/month splitting. How do I split a string in C# based on letters and numbers
You can use preg_split
$string = "Hi, my name is Bob. I m 19yo and 170cm tall";
$parts = preg_split("/(,?\s+)|((?<=[a-z])(?=\d))|((?<=\d)(?=[a-z]))/i", $string);
var_dump ($parts);
When matching against the digit-letter boundary, the regular expression match must be zero-width. The characters themselves must not be included in the match. For this the zero-width lookarounds are useful.
http://codepad.org/i4Y6r6VS
how about this:
you extract numbers from string by using regexps, store them in an array, replace numbers in string with some kind of special character, which will 'hold' their position. and after parsing the string created only by your special chars and normal chars, you will feed your numbers from array to theirs reserved places.
just an idea, but imho might work for you.
EDIT:
try to run this short code, hopefully you will see my point in the output. (this code doesnt work on codepad, dont know why)
<?php
$str = "Hi, my name is Bob. I m 19yo and 170cm tall";
preg_match_all("#\d+#", $str, $matches);
$str = preg_replace("!\d+!", "#SPEC#", $str);
print_r($matches[0]);
print $str;