Reqular Expression for getting sub-string, if not exist get other - php

I have a reqular expression for getting sub-string from a string, if a particular string doesn't exist then try to get another sub-string.
Reqular expression i am trying is this:
\#\s*\((.*?)\)|\((.*?)\)
But this does not work, and always get the second option sub-string instead of the first option.
Example string is
some text (2nd Sub-string) # (First sub-string)
And it give me this result:
Array
(
[0] => (2nd Sub-string)
[1] =>
[2] => 2nd Sub-string
)

Why don't you simply get both strings (for which your regexp works correctly) and check their existence programmatically? Something like:
$num = preg_match_all(
"/\#\s*\((.*?)\)|\((.*?)\)/",
"some text (2nd Sub-string) # (First sub-string)",
$matches, PREG_SET_ORDER
);
var_dump($num, $matches);
if($num < 2)
{
// no second match, read first
}
if(!array_key_exists(2, $matches[1]))
{
// another way to put it
}
HTH.

Related

Place or add character(s) on preg_match

I got f.e. a string
foo-bar/baz 123
and a pattern
#foo-(bar/baz)#
This pattern would give me the captured bar/baz,
But i would like to replace (sanitize) the / with a - to receive bar-baz.
Reason: i got a method that gets a string as parameter, and the regex pattern via config.
The tool around this is dynamically and will look up the returned match as an id.
But the target|entity to find is using - instead of `/ยด in the id.
So i now could hard code some exception like ~"if is this adapter then replace this and that" -
but i wonder if i could do that we regex.
Test code:
// Somewhere in a loop ...
$string = "foo-bar/baz 123"; // Would get dynamically as parameter.
$pattern = "#foo-(bar/baz)#"; // Would get dynamically from config.
// ...
if (preg_match($pattern, $string, $matches) === 1) {
// return $matches[1]; Would return captured id.
echo var_export($matches, true) . PHP_EOL;
}
Returns
array (
0 => 'foo-bar/baz',
1 => 'bar/baz',
)
Expected|Need
array (
0 => 'foo-bar/baz',
1 => 'bar-baz', // <-- "-" instead of "/"
)
So im searching a way to "match-replace".

php extract Emoji from a string

I have a string contain emoji.
I want extract emoji's from that string,i'm using below code but it doesn't what i want.
$string = "๐Ÿ˜ƒ hello world ๐Ÿ™ƒ";
preg_match('/([0-9#][\x{20E3}])|[\x{00ae}\x{00a9}\x{203C}\x{2047}\x{2048}\x{2049}\x{3030}\x{303D}\x{2139}\x{2122}\x{3297}\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', $string, $emojis);
i want this:
$emojis = ["๐Ÿ˜ƒ", "๐Ÿ™ƒ"];
but return this:
$emojis = ["๐Ÿ˜ƒ"]
and also if:
$string = "๐Ÿ˜…๐Ÿ˜‡โ˜๐Ÿฟ"
it return only first emoji
$emoji = ["๐Ÿ˜…"]
Try looking at preg_match_all function. preg_match stops looking after it finds the first match, which is why you're only ever getting the first emoji back.
Taken from this answer:
preg_match stops looking after the first match. preg_match_all, on the other hand, continues to look until it finishes processing the entire string. Once match is found, it uses the remainder of the string to try and apply another match.
http://php.net/manual/en/function.preg-match-all.php
So your code would become:
$string = "๐Ÿ˜ƒ hello world ๐Ÿ™ƒ";
preg_match_all('/([0-9#][\x{20E3}])|[\x{00ae}\x{00a9}\x{203C}\x{2047}\x{2048}\x{2049}\x{3030}\x{303D}\x{2139}\x{2122}\x{3297}\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', $string, $emojis);
print_r($emojis[0]); // Array ( [0] => ๐Ÿ˜ƒ [1] => ๐Ÿ™ƒ )

Regular Expression String Break

I am fairly new to regex. I have been trying to break string to get the initial part of the string to create folders.
Here are few examples of the variables that I need to break.
test1-792X612.jpg
test-with-multiple-hyphens-612X792.jpg
Is there a way using regular expression that I can get test1 and test-with-multiple-hyphens?
You can use a regex like this:
(.*?)-\d+x\d+
Working demo
The idea is that the pattern will match the string with the -NumXNum but capture the previous content. Note the case insensitive flag.
MATCH 1
1. [0-5] `test1`
MATCH 2
1. [18-44] `test-with-multiple-hyphens`
If you don't want to use the insensitive flag, you could change the regex to:
(.*?)-\d+[Xx]\d+
If you're certain that all filenames end with 000X000 (where the 0's are any number), this should work:
/^(.*)-[0-9]{3}X[0-9]{3}\.jpg$/
The value from (.*) will contain the part that you're looking for.
In case there could be more or fewer numbers, but at least one:
/^(.*)-[0-9]+X[0-9]+$\.jpg/
You can use this simple regex:
(.+)(?=-.+$)
Explanations:
(.+) : Capture desired part
(?=-.+$) : (Positive Lookahead) Which is following a dashed part
Live demo
If I understood your question correctly, you want to break the hyphenated parts of a file into directories. The expression (.*?)-([^-]+\.jpg)$ will capture everything before and after the last - in a .jpg file. You can then use preg_match() to match/capture these groups and explode() to split the - into different directories.
$files = array(
'test1-792X612.jpg',
'test-with-multiple-hyphens-612X792.jpg',
);
foreach($files as $file) {
if(preg_match('/(.*?)-([^-]+\.jpg)$/', $file, $matches)) {
$directories = explode('-', $matches[1]);
$file = $matches[2];
}
}
// 792X612.jpg
// Array
// (
// [0] => test1
// )
//
// 612X792.jpg
// Array
// (
// [0] => test
// [1] => with
// [2] => multiple
// [3] => hyphens
// )

Parse text and populate associative array from two substrings per line

Given a large string of text, I want to search for the following patterns:
#key: value
So an example is:
some crazy text
more nonesense
#first: first-value;
yet even more non-sense
#second: second-value;
finally more non-sense
The output should be:
array("first" => "first-value", "second" => "second-value");
<?php
$string = 'some crazy text
more nonesense
#first: first-value;
yet even more non-sense
#second: second-value;
finally more non-sense';
preg_match_all('##(.*?): (.*?);#is', $string, $matches);
$count = count($matches[0]);
for($i = 0; $i < $count; $i++)
{
$return[$matches[1][$i]] = $matches[2][$i];
}
print_r($return);
?>
Link http://ideone.com/fki3U
Array (
[first] => first-value
[second] => second-value )
Tested in PHP 5.3:
// set-up test string and final array
$myString = "#test1: test1;#test2: test2;";
$myArr = array();
// do the matching
preg_match_all('/#([^\:]+)\:([^;]+);/', $myString, $matches);
// put elements of $matches in array here
$actualMatches = count($matches) - 1;
for ($i=0; $i<$actualMatches; $i++) {
$myArr[$matches[1][$i]] = $matches[2][$i];
}
print_r($myArr);
The reasoning behind this is this:
The regex is creating two capture groups. One capture group is the key, the
other the data for that key. The capture groups are the portions of the regex
inside left and right bananas, i.e., (...).
$actualMatches just adjusts for the fact that preg_match_all returns an
extra element containing all matches lumped together.
Demo.
Match whole qualifying lines starting with # and ending with ;.
Capture the substring that does not contain any colons as the first group and capture the substring between the space after the colon and the semicolon at the end of the line.
By using the any character dot in the second capture group, the substring may contain a semicolon without damaging any extracted data.
Call array_combine() to form key-value relationships between the two capture groups.
Code: (Demo)
preg_match_all(
'/^#([^:]+): (.+);$/m',
$text,
$m
);
var_export(array_combine($m[1], $m[2]));
Output:
array (
'first' => 'first-value',
'second' => 'second-value',
)
You can try looping the string line by line (explode and foreach) and check if the line starts with an # (substr) if it has, explode the line by :.
http://php.net/manual/en/function.explode.php
http://nl.php.net/manual/en/control-structures.foreach.php
http://nl.php.net/manual/en/function.substr.php
Depending on what your input string looks like, you might be able to simply use parse_ini_string, or make some small changes to the string then use the function.

preg_math multiply responce

<?php
$string = "Movies and Stars I., 32. part";
$pattern = "((IX|IV|V?I{0,3}[\.]))";
if(preg_match($pattern, $string, $x) == false)
{
print "NAPAKA!";
}
else
{
print_r($x);
}
?>
And the response is:
Array ( [0] => I. [1] => I. )
I should get only 1 response... Why do I get multiple responses?
The element at index 0 is the whole matched string. The element at index 1 is the contents of the first capture group, i.e. the content inside the parenthesis. In this case, they just happen to be the same. Just use $x[0] to get the value you're looking for.
The nested parenthesis should, in this instance, be a "non-capturing" subpattern.
$pattern = "~((?:IX|IV|V?I{0,3}[\.]))~";
Try that. It will tell the regex compiler to not capture the results of those parenthesis into the array.
In fact, looking at your regex, you don't even need those parenthesis. Make your regex this:
$pattern = "~IX|IV|V?I{0,3}[\.]~";
That should also work.
Your pattern has multiple groups in it -> the () brackets tell you what to capture in your match.
Try this:
$pattern = "(IX|IV|V?I{0,3}[\.])";
If you have a hard time identifying the wanted groups in the result you can name them as specified in the php.net documentation.
That would look something like this:
$pattern = "(?P<groupname>IX|IV|V?I{0,3}[\.])";
You get 0-indexed for all mathced string and result for every paretness (). it's helpful to get groups i.e
preg_match('~([0-9]+)([a-z]+)','12abc',$x);
$x is ([0]=>12abc [1]=>12 [2]=>abc)
In your case you can simply delete () (1 pair ot them, 1 pair is used as delimiters)

Categories