Find first instance of character, then stop at space? - php

I think I need to use some kind of regex but struggling...
I have a string e.g.
the cat sat on the mat and $10 was all it cost
I want to return
$10
And is there a universal name for currency codes so I could return £10 if it was
the cat sat on the mat and £10 was all it cost
Or a way to add more characters to the expression

If you want to match all currency codes, use the following regex:
/\p{Sc}\d+(\.\d+)?\b/u
explanation:
/ # regex delimiter
\p{Sc} # a currency symbol
\d+ # 1 or more digit
(\.\d+)? # optionally followed by a dot and one or more digit
\b # word boundary
/ # regex delimiter
u # unicode
Have a look at this site to see the meaning of \p{Sc} (Currency Symbol)

You can use
/(\$.*?) /
(note there is a space after the closing parenthesis)
If you want to add more symbols, then use brackets:
$str = 'the cat sat on the mat and £10 was all it cost';
$matches = array();
preg_match( '/([\$£].*?) /', $str, $matches );
This will work if the currency symbol precedes the value, and if there is a space following the value. You might want to check for more general cases, such as the value being at the end of a sentence with no trailing space etc.

$string = 'the cat sat on the mat and $10 was all it cost';
$found = preg_match_all('/[$£]\d*/',$string,$results);
if ($found)
var_dump($results);

This may works for you
$string = "the cat sat on the mat and $10 was all it cost";
preg_match("/ ([\$£\]{1})([0-9]+)/", $string, $matches);
echo "<pre>";
print_r($matches);

Related

Regular expression for highlighting numbers between words

Site users enter numbers in different ways, example:
from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
I am looking for a regular expression with which I could highlight words before digits (if there are any), digits in any format and words after (if there are any). It is advisable to exclude spaces.
Now I have such a design, but it does not work correctly.
(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
The main purpose of this is to put the strings in order, bring them to the same form, format them in PHP digit format, etc.
As a result, I need to get the text before the digits, the digits themselves and the text after them into the variables separately.
$before = 'from';
$num = '8000';
$after = 'packs';
Thank you for any help in this matter)
I think you may try this:
^(\D+)?([\d \t]+)(\D+)?$
group 1: optional(?) group that will contain anything but digit
group 2: mandatory group that will contain only digits and
white space character like space and tab
group 3: optional(?) group that will contain anything but digit
Demo
Source (run)
$re = '/^(\D+)?([\d \t]+)(\D+)?$/m';
$str = 'from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $matchgroup)
{
echo "before: ".$matchgroup[1]."\n";
echo "number:".preg_replace('/\D/m','',$matchgroup[2])."\n";
echo "after:".$matchgroup[3]."";
echo "\n\n\n";
}
I corrected your regex and added groups, the regex looks like this:
^(?<before>[a-zA-Z]+)?\s?(?<number>[0-9].*?)\s?(?<after>[a-zA-Z]+)?$`
Test regex here: https://regex101.com/r/QLEC9g/2
By using groups you can easily separate the words and numbers, and handle them any way you want.
Your pattern does not match because there are 4 required parts that all expect 1 character to be present:
(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
^^^^^^^^^^^^ ^^ ^^^^^ ^^
The other thing to note is that the first character class [0-9|a-zA-Z] can also match digits (you can omit the | as it would match a literal pipe char)
If you would allow all other chars than digits on the left and right, and there should be at least a single digit present, you can use a negated character class [^\d\r\n]* optionally matching any character except a digit or a newline:
^([^\d\r\n]*)\h*(\d+(?:\h+\d+)*)\h*([^\d\r\n]*)$
^ Start of string
([^\d\r\n]*) Capture group 1, match any char except a digit or a newline
\h* Match optional horizontal whitespace chars
(\d+(?:\h+\d+)*) Capture group 2, match 1+ digits and optionally repeat matching spaces and 1+ digits
\h* Match optional horizontal whitespace chars
([^\d\r\n]*) Capture group 3, match any char except a digit or a newline
$ End of string
See a regex demo and a PHP demo.
For example
$re = '/^([^\d\r\n]*)\h*(\d+(?:\h+\d+)*)\h*([^\d\r\n]*)$/m';
$str = 'from 8 000 packs
test from 8 000 packs test
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach($matches as $match) {
list(,$before, $num, $after) = $match;
echo sprintf(
"before: %s\nnum:%s\nafter:%s\n--------------------\n",
$before, preg_replace("/\h+/", "", $num), $after
);
}
Output
before: from
num:8000
after:packs
--------------------
before: test from
num:8000
after:packs test
--------------------
before:
num:432534534
after:
--------------------
before: from
num:344454
after:packs
--------------------
before:
num:45054
after:packs
--------------------
before:
num:04555
after:
--------------------
before:
num:434654
after:
--------------------
before:
num:54564
after:packs
--------------------
If there should be at least a single digit present, and the only allowed characters are a-z for the word(s), you can use a case insensitive pattern:
(?i)^((?:[a-z]+(?:\h+[a-z]+)*)?)\h*(\d+(?:\h+\d+)*)\h*((?:[a-z]+(?:\h+[a-z]+)*)?)?$
See another regex demo and a php demo.

Regex match specific string without other string

So I've made this regex:
/(?!for )€([0-9]{0,2}(,)?([0-9]{0,2})?)/
to match only the first of the following two sentences:
discount of €50,20 on these items
This item on sale now for €30,20
As you might've noticed already, I'd like the amount in the 2nd sentence not to be matched because it's not the discount amount. But I'm quite unsure how to find this in regex because of all I could find offer options like:
(?!foo|bar)
This option, as can be seen in my example, does not seem to be the solution to my issue.
Example:
https://www.phpliveregex.com/p/y2D
Suggestions?
You can use
(?<!\bfor\s)€(\d+(?:,\d+)?)
See the regex demo.
Details
(?<!\bfor\s) - a negative lookbehind that fails the match if there is a whole word for and a whitespace immediately before the current position
€ - a euro sign
(\d+(?:,\d+)?) - Group 1: one or more digits followed with an optional sequence of a comma and one or more digits
See the PHP demo:
$strs= ["discount of €50,20 on these items","This item on sale now for €30,20"];
foreach ($strs as $s){
if (preg_match('~(?<!\bfor\s)€(\d+(?:,\d+)?)~', $s, $m)) {
echo $m[1].PHP_EOL;
} else {
echo "No match!";
}
}
Output:
50,20
No match!
You could make sure to match the discount first in the line:
\bdiscount\h[^\r\n€]*\K€\d{1,2}(?:,\d{1,2})?\b
Explanation
\bdiscount\h A word boundary, match discount and at least a single space
[^\r\n€]\K Match 0+ times any char except € or a newline, then reset the match buffer
€\d{1,2}(?:,\d{1,2})? Match €, 1-2 digits with an optional part matching , and 1-2 digits
\b A word boundary
Regex demo | Php demo
$re = '/\bdiscount\h[^\r\n€]*\K€\d{1,2}(?:,\d{1,2})?\b/';
$str = 'discount of €50,20 on these items €
This item on sale now for €30,20';
if (preg_match($re, $str, $matches)) {
echo($matches[0]);
}
Output
€50,20

Match any string in the format (+-)(digit or letter)(colon)

I need a regex to find any string that matches the format: a '+' or a '-', followed by a number or a letter, followed by a colon ':'.
Example:
"+2: Each player discards a card.\n−X: Return target nonlegendary creature card with converted mana cost X from your graveyard to the battlefield.\n−8: You get an emblem with \"Whenever a creature dies, return it to the battlefield under your control at the beginning of the next end step.\"
Should match "+2:", "-X:" and "-8:".
I've done /[0-9a-z]:/i but I can't match the plus and minus.
Thanks in advance guys.
You may use
$re = '/[−+-]?[0-9a-z]:/iu';
$str = '+2: Each player discards a card.\\n−X: Return target nonlegendary creature card with converted mana cost X from your graveyard to the battlefield.\\n−8: You get an emblem with \\"Whenever a creature dies, return it to the battlefield under your control at the beginning of the next end step.';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
See the PHP demo
The [−+-]? part matches an optional −, - or + chars.
If you want to support any other "minus" looking chars, use
$re = '/[−+\p{Pd}]?[0-9a-z]:/iu';
The \p{Pd} matches dash punctuation chars, but not the − char, unfortunately.

Regex validation for North American phone numbers

I am having trouble finding a pattern that would detect the following
909-999-9999
909 999 9999
(909) 999-9999
(909) 999 9999
999 999 9999
9999999999
\A[(]?[0-9]{3}[)]?[ ,-][0-9]{3}[ ,-][0-9]{3}\z
I tried it but it doesn't work for all the instances . I was thinking I can divide the problem by putting each character into an array and then checking it. but then the code would be too long.
You have 4 digits in the last group, and you specify 3 in the regex.
You also need to apply a ? quantifier (1 or 0 occurrence) to the separators since they are optional.
Use
^[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}$
See the demo here
PHP demo:
$re = "/\A[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}\z/";
$strs = array("909-999-9999", "909 999 9999", "(909) 999-9999", "(909) 999 9999", "999 999 9999","9999999999");
$vals = preg_grep($re, $strs);
print_r($vals);
And another one:
$re = "/\A[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}\z/";
$str = "909-999-9999";
if (preg_match($re, $str, $m)) {
echo "MATCHED!";
}
BTW, optional ? subpatterns perform better than alternations.
Try this regex:
^(?:\(\d{3}\)|\d{3})[- ]?\d{3}[- ]?\d{4}$
Explaining:
^ # from start
(?: # one of
\(\d{3}\) # '(999)' sequence
| # OR
\d{3} # '999' sequence
) #
[- ]? # may exist space or hyphen
\d{3} # three digits
[- ]? # may exist space or hyphen
\d{4} # four digits
$ # end of string
Hope it helps.

How to remove all alphanumeric words from the text?

I'm trying to write regular expression in PHP which simply would remove alphanumeric words (words which contains digits), but not numbers which have punctuation and similar special characters (e.g. prices, phone numbers, etc.).
Words which should be removed:
1st, H20, 2nd, O2, 3rd, NUMB3RS, Rüthen1, Wrocław2
Words which shouldn't be removed:
0, 5.5, 10, $100, £65, +44, (20), 123, ext:124, 4.4-BSD,
Here is the code so far:
$text = 'To remove: 1st H20; 2nd O2; 3rd NUMB3RS; To leave: Digits: -2 0 5.5 10, Prices: $100 or £65, Phone: +44 (20) 123 ext:124, 4.4-BSD';
$pattern = '/\b\w*\d\w*\b-?/';
echo $text, preg_replace($pattern, " ", $text);
However it removes all words including digits, prices and phone.
I've also tried so far the following patterns:
/(\\s+\\w{1,2}(?=\\W+))|(\\s+[a-zA-Z0-9_-]+\\d+)/ # Removes digits, etc.
/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|#)]+/ # Doesn't work.
/(\\s+\\w{1,2}(?=\\W+))|(\\s+[a-zA-Z0-9_-]+\\d+)/ # Removes too much.
/[^\p{L}\p{N}-]+/u # It removes only special characters.
/(^[\D]+\s|\s[\D]+\s|\s[\D]+$|^[\D]+$)+/ # Removes words.
/ ?\b[^ ]*[0-9][^ ]*\b/i # Almost, but removes digits, price, phone.
/\s+[\w-]*\d[\w-]*|[\w-]*\d[\w-]*\s*/ # Almost, but removes digits, price, phone.
/\b\w*\d\w*\b-?/ # Almost, but removes digits, price, phone.
/[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*/ # Almost, but removes too much.
which I've found across SO (most of them are usually too specific) and other sites which suppose to remove words with digits, but they're not.
How I can write a simple regular expression which can remove these words without touching other things?
Sample text:
To remove: 1st H20; 2nd O2; 3rd NUMB3RS;
To leave: Digits: -2 0 5.5 10, Prices: $100 or £65, Phone: +44 (20) 123 ext:124, 4.4-BSD
Expected output:
To remove: ; ; ; To leave: Digits: -2 0 5.5 10, Prices: $100 or £65, Phone: +44 (20) 123 ext:124, 4.4-BSD
How about replacing \b(?=[a-z]+\d|[a-z]*\d+[a-z]+)\w*\b\s* with nothing?
Demo: https://regex101.com/r/jA2fW3/1
Pattern code:
$pattern = '/\b(?=[a-z]+\d|[a-z]*\d+[a-z]+)\w*\b\s*/i';
To match alphanumeric words containing foreign/accented letters, use the following pattern:
$pattern = '/\b(?=[\pL]+\d|[\pL]*\d+[\pL]+)[\pL\w]*\b\s*/i';
Demo: https://regex101.com/r/jA2fW3/3
You can modify your regular expression as follows for the desired output.
$text = preg_replace('/\b(?:[a-z]+\d+[a-z]*|\d+[a-z]+)\b/i', '', $text);
To match any kind of letter from any language, use the Unicode property \p{L}:
$text = preg_replace('/\b(?:\pL+\d+\pL*|\d+\pL+)\b/u', '', $text);

Categories