return part of a string - php

I'm trying to return a certain part of a string. I've looked at substr, but I don't believe it's what I'm looking for.
Using this string:
/text-goes-here/more-text-here/even-more-text-here/possibly-more-here
How can I return everything between the first two // i.e. text-goes-here
Thanks,

$str="/text-goes-here/more-text-here/even-more-text-here/possibly-more-here";
$x=explode('/',$str);
echo $x[1];
print_r($x);// to see all the string split by /

<?php
$String = '/text-goes-here/more-text-here/even-more-text-here/possibly-more-here';
$SplitUrl = explode('/', $String);
# First element
echo $SplitUrl[1]; // text-goes-here
# You can also use array_shift but need twice
$Split = array_shift($SplitUrl);
$Split = array_shift($SplitUrl);
echo $Split; // text-goes-here
?>

The explode methods above certainly work. The reason for matching on the second element is that PHP inserts blank elements in the array whenever it starts with or runs into the delimiter without anything else. Another possible solution is to use regular expressions:
<?php
$str="/text-goes-here/more-text-here/even-more-text-here/possibly-more-here";
preg_match('#/(?P<match>[^/]+)/#', $str, $matches);
echo $matches['match'];
The (?P<match> ... part tells it to match with a named capture group. If you leave out the ?P<match> part, you'll end up with the matching part in $matches[1]. $matches[0] will contain the part with the forward slashes like "/text-goes-here/".

Just use preg_match:
preg_match('#/([^/]+)/#', $string, $match);
$firstSegment = $match[1]; // "text-goes-here"
where
# - start of regex (can be any caracter)
/ - a litteral /
( - beginning of a capturing group
[^/] - anything that isn't a litteral /
+ - one or more (more than one litteral /)
) - end of capturing group
/ - a litteral /
# - end of regex (must match first character of the regex)

Related

simple pattern with preg_match_ALL work fine!, how to use with preg_replace?

thanks by your help.
my target is use preg_replace + pattern for remove very sample strings.
then only using preg_replace in this string or others, I need remove ANY content into <tag and next symbol >, the pattern is so simple, then:
$x = '#<\w+(\s+[^>]*)>#is';
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
preg_match_all($x, $s, $Q);
print_r($Q[1]);
[1] => Array
(
[0] => class="td1"
[1] => class="td2"
)
work greath!
now I try remove strings using the same pattern:
$new_string = '';
$Q = preg_replace($x, "\\1$new_string", $s);
print_r($Q);
result is completely different.
what is bad in my use of preg_replace?
using only preg_replace() how I can remove this strings?
(we can use foreach(...) for remove each string, but where is the error in my code?)
my result expected when I intro this value:
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
is this output:
$Q = 'DATA<td>111</td><td>222</td>DATA';
Let's break down your RegEx, #<\w+(\s+[^>]*)>#is, and see if that helps.
# // Start delimiter
< // Literal `<` character
\w+ // One or more word-characters, a-z, A-Z, 0-9 or _
( // Start capturing group
\s+ // One or more spaces
[^>]* // Zero or more characters that are not the literal `>`
) // End capturing group
> // Literal `>` character
# // End delimiter
is // Ignore case and `.` matches all characters including newline
Given the input DATA<td class="td1">DATA this matches <td class="td1"> and captures class="td1". The difference between match and capture is very important.
When you use preg_match you'll see the entire match at index 0, and any subsequent captures at incrementing indexes.
When you use preg_replace the entire match will be replaced. You can use the captures, if you so choose, but you are replacing the match.
I'm going to say that again: whatever you pass as the replacement string will replace the entirety of the found match. If you say $1 or \\=1, you are saying replace the entire match with just the capture.
Going back to the sample after the breakdown, using $1 is the equivalent of calling:
str_replace('<td class="td1">', ' class="td1"', $string);
which you can see here: https://3v4l.org/ZkPFb
To your question "how to change [0] by $new_string", you are doing it correctly, it is your RegEx itself that is wrong. To do what you are trying to do, your pattern must capture the tag itself so that you can say "replace the HTML tag with all of the attributes with just the tag".
As one of my comments noted, this is where you'd invert the capturing. You aren't interesting in capturing the attributes, you are throwing those away. Instead, you are interested in capturing the tag itself:
$string = 'DATA<td class="td1">DATA';
$pattern = '#<(\w+)\s+[^>]*>#is';
echo preg_replace($pattern, '<$1>', $string);
Demo: https://3v4l.org/oIW7d

Extract shortcode from Instagram URL

I try to extract the shortcode from Instagram URL
Here what i have already tried but i don't know how to extract when they are an username in the middle. Thank you a lot for your answer.
Instagram pattern : /p/shortcode/
https://regex101.com/r/nO4vdd/1/
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/
expected : BxKRx5CHn5i
I took you original query and added a .* bafore the \/p\/
This gave a query of
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com.*\/p\/)([\d\w\-_]+)(?:\/)?(\?.*)?$
This would be simpler assuming the username always follows the /p/
^(?:.*\/p\/)([\d\w\-_]+)
You could prepend an optional (?:\/\w+)? non capturing group.
Note that \w also matches _ and \d so the capturing group could be updated to ([\w-]+) and the forward slash in the non capturing group might also be written as just /
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com(?:\/\w+)?\/p\/)([\w-]+)(?:\/)?(\?.*)?$
Regex demo
You don't have to escape the backslashes if you use a different delimiter than /. Your pattern might look like:
^(?:https?://)?(?:www\.)?(?:instagram\.com(?:/\w+)?/p/)([\w-]+)/?(\?.*)?$
This expression might also work:
^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$
Test
$re = '/^https?:\/\/(?:www\.)?instagram\.com\/[^\/]+(?:\/[^\/]+)?\/([^\/]{11})\/.*$/m';
$str = 'https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/p/BxKRx5CHn5i/?utm_source=ig_share_sheet&igshid=znsinsart176
https://www.instagram.com/p/BxKRx5CHn5i/
https://www.instagram.com/username/p/BxKRx5CHn5i/';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $match) {
var_export($match[1]);
}
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Assuming that you aren't simply trusting /p/ as the marker before the substring, you can use this pattern which will consume one or more of the directories before your desired substring.
Notice that \K restarts the fullstring match, and effectively removes the need to use a capture group -- this means a smaller output array and a shorter pattern.
Choosing a pattern delimiter like ~ which doesn't occur inside your pattern alleviates the need to escape the forward slashes. This again makes your pattern more brief and easier to read.
If you do want to rely on the /p/ substring, then just add p/ before my \K.
Code: (Demo)
$strings = [
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];
foreach ($strings as $string) {
echo preg_match('~(?:https?://)?(?:www\.)?instagram\.com(?:/[^/]+)*/\K\w+~', $string , $m) ? $m[0] : '';
echo " (from $string)\n";
}
Output:
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BrODg5XHlE6 (from https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176)
BxKRx5CHn5i (from https://www.instagram.com/p/BxKRx5CHn5i/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/)
BxE5PpZhoa9 (from https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere)
If you are implicitly trusting the /p/ as the marker and you know that you are dealing with instagram links, then you can avoid regex and just cut out the 11-character-substring, 3-characters after the marker.
Code: (Demo)
$strings = [
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/p/BrODg5XHlE6/?utm_source=ig_share_sheet&igshid=znsinsart176",
"https://www.instagram.com/p/BxKRx5CHn5i/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/",
"https://www.instagram.com/username/p/BxE5PpZhoa9/#look=overhere"
];
foreach ($strings as $string) {
$pos = strpos($string, '/p/');
if ($pos === false) {
continue;
}
echo substr($string, $pos + 3, 11);
echo " (from $string)\n";
}
(Same output as previous technique)

How to preg_match all three cases in the content-disposition header?

I'm trying to decode the content-disposition header (from curl) to get the filename using the following regular expression:
<?php
$str = 'attachment;filename="unnamed.jpg";filename*=UTF-8\'\'unnamed.jpg\'';
preg_match('/^.*?filename=(["\'])([^"\']+)\1/m', $str, $matches);
print_r($matches);
So while it matches if the filename is in single or double quotes, it fails if there are no quotes around the filename (which can happen)
$str = 'attachment;filename=unnamed.jpg;filename*=unnamed.jpg';
Right now I'm using two regular expressions (with if-else) but I just wanted to learn if it is possible to do in a single regex? Just for my own learning to master regex.
I will use the branch reset feature (?|...|...|...) that gives a more readable pattern and avoids to create a capture group for the quotes. In a branch-reset group, each capture groups have the same numbers for each alternative:
if ( preg_match('~filename=(?|"([^"]*)"|\'([^\']*)\'|([^;]*))~', $str, $match) )
echo $match[1], PHP_EOL;
Whatever the alternative that succeeds, the capture is always in group 1.
Just to put my two cents in - you could use a conditional regex:
filename=(['"])?(?(1)(.+?)\1|([^;]+))
Broken down, this says:
filename= # match filename=
(['"])? # capture " or ' into group 1, optional
(?(1) # if group 1 was set ...
(.+?)\1 # ... then match up to \1
| # else
([^;]+) # not a semicolon
)
Afterwards, you need to check if group 2 or 3 was present.
Alternatively, go for #Casimir's answer using the (often overlooked) branch reset.
See a demo on regex101.com.
One approach is to use an alternation in a single regex to match either a single/double quoted filename, or a filename which is completely unquoted. Note that one side effect of this approach is that we introduce more capture groups into the regex. So we need a bit of extra logic to handle this.
<?php
$str = 'attachment;filename=unnamed.jpg;filename*=UTF-8\'\'unnamed.jpg\'';
$result = preg_match('/^.*?filename=(?:(?:(["\'])([^"\']+)\1)|([^"\';]+))/m',
$str, $matches);
print_r($matches);
$index = count($matches) == 3 ? 2 : 3;
if ($result) {
echo $matches[$index];
}
else {
echo "filename not found";
}
?>
Demo
You could make your capturing group optional (["\'])? and \1? like:
and add a semicolon or end of the string to the end of the regex in a non capturing group which checks if there is a ; or the end of the line (?:;|$)
^.*?filename=(["\'])?([^"\']+)\1?(?:;|$)
$str = 'attachment;filename=unnamed.jpg;filename*=UTF-8\'\'unnamed.jpg\'';
preg_match('/^.*?filename=(["\'])?([^"\']+)\1?(?:;|$)/m', $str, $matches);
print_r($matches);
Output php
You can also use \K to reset the starting point of the reported match and then match until you encounter a double quote or a semicolon [^";]+. This will only return the filename.
^.*?filename="?\K[^";]+
foreach ($strings as $string) {
preg_match('/^.*?filename="?\K[^";]+/m', $string, $matches);
print_r($matches);
}
Output php

Regex rules in an array

Maybe it can not be solved this issue as I want, but maybe you can help me guys.
I have a lot of malformed words in the name of my products.
Some of them has leading ( and trailing ) or maybe one of these, it is same for / and " signs.
What I do is that I am explode the name of the product by spaces, and examines these words.
So I want to replace them to nothing. But, a hard drive could be 40GB ATA 3.5" hard drive. I need to process all the word, but I can not use the same method for 3.5" as for () or // because this 3.5" is valid.
So I only need to replace the quotes, when it is at the start of the string AND at end of the string.
$cases = [
'(testone)',
'(testtwo',
'testthree)',
'/otherone/',
'/othertwo',
'otherthree/',
'"anotherone',
'anothertwo"',
'"anotherthree"',
];
$patterns = [
'/^\(/',
'/\)$/',
'~^/~',
'~/$~',
//Here is what I can not imagine, how to add the rule for `"`
];
$result = preg_replace($patterns, '', $cases);
This is works well, but can it be done in one regex_replace()? If yes, somebody can help me out the pattern(s) for the quotes?
Result for quotes should be this:
'"anotherone', //no quote at end leave the leading
'anothertwo"', //no quote at start leave the trailin
'anotherthree', //there are quotes on start and end so remove them.
You may use another approach: rather than define an array of patterns, use one single alternation based regex:
preg_replace('~^[(/]|[/)]$|^"(.*)"$~s', '$1', $s)
See the regex demo
Details:
^[(/] - a literal ( or / at the start of the string
| - or
[/)]$ - a literal ) or / at the end of the string
| - or
^"(.*)"$ - a " at the start of the string, then any 0+ characters (due to /s option, the . matches a linebreak sequence, too) that are captured into Group 1, and " at the end of the string.
The replacement pattern is $1 that is empty when the first 2 alternatives are matched, and contains Group 1 value if the 3rd alternative is matched.
Note: In case you need to replace until no match is found, use a preg_match with preg_replace together (see demo):
$s = '"/some text/"';
$re = '~^[(/]|[/)]$|^"(.*)"$~s';
$tmp = '';
while (preg_match($re, $s) && $tmp != $s) {
$tmp = $s;
$s = preg_replace($re, '$1', $s);
}
echo $s;
This works
preg_replace([[/(]?(.+)[/)]?|/\"(.+)\"/], '$1', $string)

Match pattern and exclude substrings with preg_match_all

I need to find all the strings placed between START and END, escluding PADDING substring from matched string. The best way I've found is
$r="stuffSTARTthisPADDINGisENDstuffstuffSTARTwhatPADDINGIwantPADDINGtoPADDINGfindENDstuff" ;
preg_match_all('/START(.*?)END/',str_replace('PADDING','',$r),$m);
print(join($m[1]));
> thisiswhatIwanttofind
I want to do this with the smallest code size possible: there a shorter with only preg_match_all and no str_replace, that eventually returns directly the string without join arrays? I've tried with some lookaround expressions but I can't find the proper one.
$r="stuffSTARTthisPADDINGisENDstuffstuffSTARTwhatPADDINGIwantPADDINGtoPADDINGfindENDstuff";
echo preg_replace('/(END.*?START|PADDING|^[^S]*START|END.*$)/', '', $r);
This should return you thisiswhatIwanttofind using a single regular expression pattern
Explanation:-
END.*?START # Replace occurrences of END to START
PADDING # Replace PADDING
^[^S]*START # Replace any character until the first START (inclusive)
END.*$ # Replace the last END and until end of the string
$r="stuffSTARTthisPADDINGisENDstuffstuffSTARTwhatPADDINGIwantPADDINGtoPADDINGfindENDstuff" ;
preg_match_all('/(?:START)(.*?)(?:END)/',str_replace('PADDING','',$r),$m);
var_dump(implode(' ',$m[1]));
would work but I guess you want something faster.
You can also use use preg_replace_callback like this:
$str = preg_replace_callback('#.*?START(.*?)END((?!.*?START.*?END).*$)?#',
function ($m) {
print_r($m);
return str_replace('PADDING', '', $m[1]);
}, $r);
echo $str . "\n"; // prints thisiswhatIwanttofind

Categories