In PHP I'm searching for phonenumbers in a certain text.
I use explode() to divide the text in different parts,using the area code of the city I'm searching for as the delimiter. The problem is that phonenumbers that include the same numbers as the area-code are not returned well.
For example:
"foofoo 010-1234567 barbar" splits into "foofoo " and "-1234567 barbar"
but
"foofoo 010-1230107 barbar" splits into "foofoo ", "-123" and "7 barbar" !
I can use the first one to reconstruct the phonenummer with the areacode, but the second goes wrong of course...
I guess I need a regular expression to split the text with some kind of mechanism to not split on short strings, instead of explode() , but I don't know how to do it.
Any ideas or a better way to search for phonenumbers in a text ?
UPDATE:
The format is NOT consistent, so looking for the hyphen is no solution. Some phone numbers have spaces between the area code and number, some have hooks, some have nothing, etc. Dutch phonenumbers have an areacode of 2,3 or 4 numbers and are usually 10 numbers in total.
To find phone numbers like:
010-1234010
010 1234010
010 123 4010
0101234010
010-010-0100
Try this:
$text = 'foofoo 010-1234010 barbar 010 1234010 foofoo ';
$text .= ' 010 123 4010 barbar 0101234010 foofoo 010-010-0100';
$matches = array();
// returns all results in array $matches
preg_match_all('/[0-9]{3}[\-][0-9]{6}|[0-9]{3}[\s][0-9]{6}|[0-9]{3}[\s][0-9]{3}[\s][0-9]{4}|[0-9]{9}|[0-9]{3}[\-][0-9]{3}[\-][0-9]{4}/', $text, $matches);
$matches = $matches[0];
var_dump($matches);
You could use a regular expression to match the phone numbers. There are many, many ways to skin this particular cat (and likely many identical questions here on SO) a super-basic example might look like the following.
$subject = "foofoo 010-1230107 barbar 010-1234567";
preg_match_all('/\b010-\d+/', $subject, $matches);
$numbers = $matches[0];
print_r($numbers);
The above would output the contents of the $numbers array.
Array
(
[0] => 010-1230107
[1] => 010-1234567
)
If you delete all of the non numeric characters, you will only be left with the phone number. You can then take that string and parse it into ###-###-#### if you wish.
$phone = preg_replace('/\D/', '', 'Some test with 123-456-7890 that phone number');
//$phone is now 1234567890
echo substr($phone, 0, 3);//123
echo subsr($phone, 3, 3);//456
echo substr($phone, 6);//7890
Not sure if that is what you are looking for or not.
Related
I have a problem with RegEX. I have output like this.
Number of rooms
2
Price
120000
Square in meter
60
I’m trying to achieve this: I want remove all text except “Number of rooms 2” My value “2” changes. So far I have expression like this:
<?php
$str = get_field('all');
preg_match('/ Number of rooms \s*(\d+)/' , $str, $matches);
echo $matches[1];
?>
Remove the preceding space before the Number word :
preg_match('/Number of rooms \s*(\d+)/' , $str, $matches);
Remove the Space before Number and after rooms in your regex:
$str = 'Number of rooms
2
Price
120000
Square in meter
60';
preg_match('/Number of rooms\s*(\d+)/' , $str, $matches);
print_r($matches);
output:
Array
(
[0] => Number of rooms
2
[1] => 2
)
As the others has said it's the space. You can solve it with removing the space or make it optional with *.
I would advise to use regex options im also as it will be case insensitive and treat the string as multilined.
preg_match('/number of rooms\s*(\d+)/im', $str, $m);
var_dump($m);
you can try it this way:
$str = get_field('all');
$str_array = explode("\n",$str);
$new_str=$str_array[0]." ".$str_array[1];
echo $new_str;
So I have this string:
Best location using 168 cars + cleaning
The '168' is the part i'd like to extract from this string.
I have approximately 80 occurences of this string, all alternating from 'xx' cars to 'xxx' cars (so, 2 or 3 numbers). However, in each string, 'cars' comes after the number i'd like to return.
What would be the best way using PHP to achieve this?
The best way is to do a simple preg_match on the text.
See the tutorial: http://php.net/manual/en/function.preg-match.php
<?php
$string = 'Best location using 168 cars + cleaning';
$pattern = '/(\d{2,3}+) cars/';
preg_match($pattern, $string, $match);
echo $match[1];
This regex returns all the numbers with length of 2 to 3 before the word cars.
you can change the length as you want and \d means all the numbers.
Easiest way is probably via preg_match(). Look for a space, one or more digits, a space, then the word cars. Use parens to capture the digits. That gives you pattern like this:
' (\d+) cars'
Then just pass that to preg_match() with a third argument to capture the parenthesized substring:
if (preg_match('/ (\d+) cars/', $str, $match)) {
echo "your num is: " . $match[1] . "\n";
}
Note this will also capture 1 cars and 1234 cars. If that's a problem, and you want to ensure that you only get the values with two or three digits, you can tweak the pattern to explicitly require that:
' (\d{2,3}) cars'
I would explode the string on a space and then loop through the array looking for the string "cars" and then get the key value for this. From here you know that the number will be before the "cars" occurrence so minus 1 from this key value and look in the array.
$original_string = "Best location using 168 cars + cleaning";
$string = explode(" ", $original_string);
foreach ($string as $key => $part) {
if($part == "cars") {
$number = $string[$key-1];
}
}
Explanation:
$original_string is whatever your whole string where the number is unknown.
$string is an array of the $original_string, each word will be in it's own part of the array
we loop through this array looking for the string "cars" and also get its key value.
If we find it successfully we then go to the key value minus one to find the number. We do this because we know the number appears before the "cars" string.
I have a string like that:
$string = "Half Board, 10% Off & 100 Euro Star - Save £535";
The percentage can be anywhere in the string.
This is what I have, but I don't like the fact that it has a preg_replace AND a loop operation, which are heavy. I'd like a ReGex expression that would do it in one operation.
$string = "Half Board, 10% Off & 100 Euro Star - Save £535";
$string_array = explode(" ", $string);
$pattern = '/[^0-9,.]*/'; // non-digits
foreach($string_array as $pc) {
if(stristr($pc, '%')) {
$percent = preg_replace($pattern, '', $pc);
break;
}
}
echo $percent;
exit;
Update:
From the code you added to your question, I get the impression your percentages might look like 12.3% or even .50%. In which case, the regex you're looking for is this:
if (preg_match_all('/(\d+|\d+[.,]\d{1,2})(?=\s*%)/','some .50% and 5% text with 12.5% random percentages and 123 digits',$matches))
{
print_r($matches);
}
Which returns:
Array
(
[0] => Array
(
[0] => .50
[1] => 5
[2] => 12.5
)
[1] => Array
(
[0] => .50
[1] => 5
[2] => 12.5
)
)
the expression explained:
(\d+|\d*[.,]\d{1,2}): is an OR -> either match digits \d+, or \d* zero or more digits, followed by a decimal separator ([.,]) and 1 or 2 digits (\d{1,2})
(?=\s*%): only if the afore mentioned group is followed by zero or more spaces and a % sign
Using a regular expression, with a positive lookahead, you can get exactly what you want:
if (preg_match_all('/\d+(?=%)/', 'Save 20% if you buy 5 iPhone charches (excluding 9% tax)', $matches))
{
print_r($matches[0]);
}
gives you:
array (
0 => '20',
1 => '9'
)
Which is, I believe, what you are looking for
The regex works like this:
\d+ matches at least 1 digit (as many as possible)
(?=%): provided they are followed by a % sign
Because of the lookahead, the 5 isn't matched in the example I gave, because it's followed by a space, not a % sign.
If your string might be malformed (have any number of spaces between the digit and the % sign) a lookahead can deal with that, too. As ridgerunner pointed out to me, only lookbehinds need to be of fixed size, so:
preg_match_all('/\d+(?=\s*%)/', $txt, $matches)
The lookahead works like this
\s*: matches zero or more whitespace chars
%: and percent sign
Hence, both 123 % and 123% fit the pattern, and will match.
A good place to read up on regex's is regular-expressions.info
If "complex" regex's (ie with lookaround assertions) aren't your cup of tea (yet, though I strongly suggest learning to use them), you could resort to splitting the string:
$parts = array_map('trim', explode('%', $string));
$percentages = array();
foreach($parts as $part)
{
if (preg_match('/\d+$/', $part, $match))
{//if is required, because the last element of $parts might not end with a number
$percentages[] = $match[0];
}
}
Here, I simply use the % as delimiter, to create an array, and trim each string section (to avoid trailing whitespace), and then procede to check each substring, and match any number that is on the end of that substring:
'get 15% discount'
['get 15', 'discount']
/\d+$/, 'get 15' = [15]
But that's just an awful lot of work, using a lookahead is just way easier.
$str = "Half Board, 10% Off & 100 Euro Star - Save £535";
preg_match("|\d+|", $str, $arr);
print_r($arr);
Try with split like
$str_arr = split(' ',$str);
$my_str = split('%',$str_arr[1]);
echo $my_str[0];
This should work:
$str = "Save 20% on iPhone chargers...";
if (preg_match_all('/\d+(?=%)/', $str, $match))
print_r($match[0]);
Live Demo: http://ideone.com/FLKtE9
I know I'm just being simple-minded at this point but I'm stumped. Suppose I have a textual target that looks like this:
Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.
Using this RegExp: \s[A-Z][A-Z]\d\d\d\d\s, how would I extract, individually, the first and second occurrences of the matching strings? "JH6781" and "RB1223", respectively. I guarantee that the matching string will appear exactly twice in the target text.
Note: I do NOT want to change the existing string at all, so str_replace() is not an option.
Erm... how about using this regex:
/\b[A-Z]{2}\d{4}\b/
It means 'match boundary of a word, followed by exactly two capital English letters, followed by exactly four digits, followed by a word boundary'. So it won't match 'TGX7777' (word boundary is followed by three letters - pattern match failed), and it won't match 'TX77777' (four digits are followed by another digit - fail again).
And that's how it can be used:
$str = "Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.";
preg_match_all('/\b[A-Z]{2}\d{4}\b/', $str, $matches);
var_dump($matches[0]);
// array
// 0 => string 'JH6781' (length=6)
// 1 => string 'RB1223' (length=6)
$s='Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother\'s HG766 id was RB1223.';
$n=preg_match_all('/\b[A-Z][A-Z]\d\d\d\d\b/',$s,$m);
gives the result $n=2, then
print_r($m);
gives the result
Array
(
[0] => Array
(
[0] => JH6781
[1] => RB1223
)
)
You could use a combination of preg_match with the offset parameter(5th) and strpos to select the first and second occurrence.
Alternatively you could use preg_match_all and just use the first two array entries
<?php
$first = preg_match($regex, $subject, $match);
$second = preg_match($regex, $subject, $match, 0, strpos($match[0]) + 1);
?>
I am trying to split a string into terms in PHP using preg_split. I need to extract normal words ( \w ) but also currency ( even currency symbol ) and numeric terms ( including commas and decimal points ). Can anyone help me out, as I cannot seem to create a valid regex to use for preg_split to achieve this. Thanks
Why not use preg_match_all() instead of preg_split() ?
$str = '"1.545" "$143" "$13.43" "1.5b" "hello" "G9"'
. ' This is a test sentence, with some. 123. numbers'
. ' 456.78 and punctuation! signs.';
$digitsPattern = '\$?\d+(\.\d+)?';
$wordsPattern = '[[:alnum:]]+';
preg_match_all('/('.$digitsPattern.'|'.$wordsPattern.')/i', $str, $matches);
print_r($matches[0]);
What about preg_match_all() each word with this [\S]+\b then you get an array with the words in it.
Big brown fox - $20.25 will return
preg_match_all('/[\S]+\b/', $str, $matches);
$matches = array(
[0] = 'Big',
[1] = 'brown',
[2] = 'fox',
[3] = '$20.25'
)
Does it solve your problem to split on whitespace? "/\s+/"