Extract uppercase words from a string - php

Edit i resolve my problem.
This is the solution
$string = "Hello my Name is Paloppa. Im' 20 And? Hello! Words I Io Man";
// Word boundary before the first uppercase letter, followed by any alphanumeric character
preg_match_all( '/(?<!^)\b[A-Z][a-z]{1,}\b(?!["!?.\\\'])/', $string, $matches);
print_r( $matches[0] );
Now i have one more question
Every time it founds a word, the word is inserted in a position of the array.
If i have this phrase "Whats is your Name and Surname? My Name And Surname' is Paolo Celio and Serie A Iim 25 Thanksbro Bro Ciao"
this is my code
$string = "Whats is your Name and Surname? My Name And Surname' is Paolo Celio and Serie A Iim 25 Thanksbro Bro Ciao";
// Word boundary before the first uppercase letter, followed by any alphanumeric character
preg_match_all( '/(?<!^)\b([A-Z][a-z]+ +){1,}\b(?!["!?.\\\'])/', $string, $matches);
print_r( $matches[0] );
the output is the following
Array (
[0] => Name
[1] => Name And Surname
[2] => Paolo Celio
[3] => Serie
[4] => Iim
[5] => Thanksbro Bro
)
Why it doesn't join Serie A instead it didn't print A?
Why the last word doesn't in the output?
Thanks
EDIT
I resolve my problem, this is my REGEX
preg_match_all('/(?<!^)\b[A-Z]([a-z0-9A-Z]| [A-Z]){1,}\b(?!["!?.\\\'])/', $string, $matches);

You can use..
<?php
$test="the Quick brown Fox jumps Over the Lazy Dog";
preg_match_all("/[A-Z][a-z]*/",$test,$op);
$output = implode(' ',$op[0]);
echo $output;
?>

This is slightly complicated, when it comes to edge cases, yet we would simply define two char classes based on our desired outputs and inputs, maybe with a word boundary, with an expression similar to:
(?=[^I'])\b([A-Z][a-z'é]+)\b
and we would expand it based on our cases.
Demo
Test
$re = '/(?=[^I\'])\b([A-Z][a-z\'é]+)\b/m';
$str = 'Hello my name is Paloppa. I\'m 20 And i love Football.
Hello my name is Chloé. I\'m 20 And i love Football.
Hello my name is Renée O\'neal. I\'m 20 And i love Football.';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
RegEx Circuit
jex.im visualizes regular expressions:

To extract full words you'll need to use word boundaries and character classes to match the remaining part of the word, and use lookbehinds to exclude previous content:
$string = "Hello my Name is Paloppa. I'm 20 And? Hello! Words' Man";
// Word boundary before the first uppercase letter, followed by any alphanumeric character
preg_match_all( '/(?<!^)(?<!\. )\b[A-Z][a-zA-Z]*\b(?!["!?\\\'])/', $string, $matches);
print_r( $matches[0] );
If you want Capitalized words only, excluding MixedCase words, replace [a-zA-Z] with just [a-z].
Demo here

You can fastest way use.
$test="Hi There this Is my First Job";
preg_match_all('/[A-Z][a-z]*/', $test, $matches, PREG_OFFSET_CAPTURE);
$res=array();
foreach( $matches[0] as $key=> $value){
$res[]=$value[0];
}
print_r($res);
OUTPUT:
Array
(
[0] => Hi
[1] => There
[2] => Is
[3] => First
[4] => Job
)
DEMO

Related

trivial regex assistance

I have a string as follows:
$str="1-3";
When I pass it through here:
preg_match('#(\\d+)\\s*-\\s*(\\d+)#', $str, $matches);
I get:
$matches[0] //1-3
$matches[1] //1
$matchers[2] //3
Now if you hass something like this:
$str="a-3";
You get
$matches //empty
This is correct since it is restricted to only integers.
Now my problem is i want to implement something that functions the same however for characters.
Here's what I have so far
preg_match('#(\\w+)\\s*-\\s*(\\w+)#', $str, $matches);
$str="a-d"
I get:
$matches[0] //a-d
$matches[1] //a
$matchers[2] //d
Which works great, however if u do this (notice the integer):
$str="a-5"
I get:
$matches[0] //a-5
$matches[1] //a
$matchers[2] //5
What i need is to enforce only alphabetic characters on the subsequent regex expression - thus if you pass a-5 it should be marked as errored.
Essentially i need the first regex solution applied to the second one with characters only
Simple Change the capturing group to ([a-zA-z]+), Like(DEMO):
([a-zA-Z]+)\s*-\s*([a-zA-Z]+)
\w, works by matching, any alphanumeric characters and _ underscore. If you only want to match alphabets then you need to provide the alphabets range like
a-z small letter and A-Z capital letters.
You could use unicode property \pL that means any letter in any language:
$arr = [
'a-d',
'1-5',
'1-d',
'ç-é',
];
foreach($arr as $str) {
if (preg_match('/(\pL)\s*-\s*(\pL)/u', $str, $matches)) {
print_r($matches);
} else {
echo "$str : error\n";
}
}
Output:
Array
(
[0] => a-d
[1] => a
[2] => d
)
1-5 : error
1-d : error
Array
(
[0] => ç-é
[1] => ç
[2] => é
)

Regex to accept any set of characters, and then remember and find that same set that was provided

I want regex to accept any number of characters, and then remember that exact set of characters and then look for it later in the line.
For example, if Regex saw the line begin with 'TheseCharacters', then I would want it to match the line if it saw 'TheseCharacters' occur later in the line.
Examples (all these would match):
TheseCharacters, I really enjoy TheseCharacters.
Dog1, My favorite word is Dog1.
The following would not match:
Cakeman, oh I enjoy cakeboy.
Is this outside the scope of regex, or is there a way to dynamically do this?
It is a little hard to tell what you are trying to do, but from what I understand, you could use grouping and backreferences to accomplish this. Something like this:
<?php
$pattern = '/^(\b\w+\b).*\b\1\b.*/i';
//should match
$string = "TheseCharacters, I really enjoy TheseCharacters";
$result = preg_match($pattern, $string, $matches);
echo "String 1 matches {$result} times: ".print_r($matches,true)."\n";
//match only with case insensitive flag, not an exact match in case
$string = "TheseCharacters, I really enjoy thesecharacters";
$result = preg_match($pattern, $string, $matches);
echo "String 1 matches {$result} times: ".print_r($matches,true)."\n";
//should match, doesn't require TheseCharacters to be at the end of the string.
$string = "TheseCharacters, I really enjoy TheseCharacters and some others";
$result = preg_match($pattern, $string, $matches);
echo "String 2 matches {$result} times: ".print_r($matches,true)."\n";
//no match, TheseCharacters has been changed to TheseLetters
$string = "TheseCharacters, I really enjoy TheseLetters";
$result = preg_match($pattern, $string, $matches);
echo "String 3 matches {$result} times: ".print_r($matches,true)."\n";
//no match, additional letters has been added to TheseCharacters
$string = "TheseCharacters, I really enjoy TheseCharactersasdf";
$result = preg_match($pattern, $string, $matches);
echo "String 4 matches {$result} times: ".print_r($matches,true)."\n";
which produces this output:
String 1 matches 1 times: Array
(
[0] => TheseCharacters, I really enjoy TheseCharacters
[1] => TheseCharacters
)
String 1 matches 1 times: Array
(
[0] => TheseCharacters, I really enjoy TheseCharacters
[1] => TheseCharacters
)
String 2 matches 1 times: Array
(
[0] => TheseCharacters, I really enjoy TheseCharacters and some others
[1] => TheseCharacters
)
String 3 matches 0 times: Array
(
)
String 4 matches 0 times: Array
(
)
Demo: https://3v4l.org/upNhm
And explanation of the pattern here: https://regex101.com/r/DuTbyn/2
And it's not really a "variable" that is being stored. It is a group, which you can reference later on by it's group number. So initially I am matching the first group of letters/numbers from the first start of the string (^(\b\w+\b)). Then followed by any number of characters and later matching whatever was captured in that first group. The matching entire string will be available in $matches[0] and the repeating string will be available in $matches[1].
Without knowing more about what you are trying to do, this is pretty much the only way. Other ways might be to match or split each word into the individual words into an array and simply use array_count_values to get a count of each word.

Words finder regex fails

I'm using this pattern to check if certain words exists in a string:
/\b(apple|ball|cat)\b/i
It works on this string cat ball apple
but not on no spaces catball smallapple
How can the pattern be modified so that the words match even if they are combined with other words and even if there are no spaces?
Remove \b from the regex. \b will match a word boundary, and you want to match the string that is not a complete word.
You can also remove the capturing group (denoted by ()) as it is not required any longer.
Use
/apple|ball|cat/i
Regex Demo
An IDEONE PHP demo:
$re = "/apple|ball|cat/i";
$str = "no spaces catball smallapple";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Results:
[0] => cat
[1] => ball
[2] => apple

How to obtain multiple custom placeholder content?

In PHP I have a string that can contain any quantity of customer placeholders. In this case I am using '[%' & '%]' as the custom placeholders for each iteration.
If my string is equal to:
"test [%variable1%] test test [%variable2%]"
How do I extract the 'variables' so I will have something like this:
array(
[0] => variable1,
[1] => variable2
);
At the moment I have: \b[\[%][a-z.*][\]%]\b but I know this is incorrect.
Use preg_match_all function to do a global match.
$re = "~(?<=\[%).*?(?=%])~m";
$str = "test [%variable1%] test test [%variable2%]";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
(?<=\[%) positive lookbehind which asserts that the match must be preceeded by [% symbols. (?=%]) asserts that the match must be followed by %] symbols. .*? will do a non-greedy match of any character zero or more times.
Output:
Array
(
[0] => variable1
[1] => variable2
)
DEMO
$re = "/\\[%(.*?)%\\]/";
$str = "test [%variable1%] test test [%variable2%]";
preg_match_all($re, $str, $matches);
Regex used:
/\[%(.*?)%\]/g

Get all occurrences of words between curly brackets

I have a text like:
This is a {demo} phrase made for {test}
I need to get
demo
test
Note: My text can have more than one block of {}, not always two. Example:
This is a {demo} phrase made for {test} written in {English}
I used this expression /{([^}]*)}/ with preg_match but it returns only the first word, not all words inside the text.
Use preg_match_all instead:
preg_match_all($pattern, $input, $matches);
It's much the same as preg_match, with the following stipulations:
Searches subject for all matches to the regular expression given in
pattern and puts them in matches in the order specified by flags.
After the first match is found, the subsequent searches are continued
on from end of the last match.
Your expression is correct, but you should be using preg_match_all() instead to retrieve all matches. Here's a working example of what that would look like:
$s = 'This is a {demo} phrase made for {test}';
if (preg_match_all('/{([^}]*)}/', $s, $matches)) {
echo join("\n", $matches[1]);
}
To also capture the positions of each match, you can pass PREG_OFFSET_CAPTURE as the fourth parameter to preg_match_all. To use that, you can use the following example:
if (preg_match_all('/{([^}]*)}/', $s, $matches, PREG_OFFSET_CAPTURE)) {
foreach ($matches[1] as $match) {
echo "{$match[0]} occurs at position {$match[1]}\n";
}
}
As the { and } are part of regex matching syntax, you need to escape these characters:
<?php
$text = <<<EOD
this {is} some text {from}
which I {may} want to {extract}
some words {between} brackets.
EOD;
preg_match_all("!\{(\w+)\}!", $text, $matches);
print_r($matches);
?>
produces
Array
(
[0] => Array
(
[0] => {is}
[1] => {from}
[2] => {may}
[3] => {extract}
[4] => {between}
)
... etc ...
)
This example may be helpful to understand the use of curly brackets in regexes:
<?php
$str = 'abc212def3456gh34ij';
preg_match_all("!\d{3,}!", $str, $matches);
print_r($matches);
?>
which returns:
Array
(
[0] => Array
(
[0] => 212
[1] => 3456
)
)
Note that '34' is excluded from the results because the \d{3,} requires a match of at least 3 consecutive digits.
Matching portions between pair of braces using RegEx, is less better than using Stack for this purpose. Using RegEx would be something like «quick and dirty patch», but for parsing and processing input string you have to use a stack. Visit here for the concept and here for applying the same.

Categories