get variables from regular expresion - php

I have one of the following strings:
mystring
/mystring
mystring?test
/mystring?test
That is one string preceded by one optional / and followed by and optional ?test
I need to get this variables:
$string = "mystring"
$test = false / true depending if ?test is present
I'm trying to use regular expressions but I'm having trouble with the right pattern. I'm trying:
\/?(\w+)(\??\w+)
For example, for "mystring", I'm getting this:
Array
(
[0] => /mystring
[1] => mystrin
[2] => g
)
This is a sample code:
<?
echo "<pre>";
$input = "/mystring";
$pattern = "/\/?(\w+)(\??\w+)/";
$matches = null;
preg_match($pattern, $input, $matches);
print_r($matches);
?>

For a non-regex alternative if you're interested, you can use parse_url. It accepts partial URLs, and it can parse strings like those.
$components = parse_url($input);
$string is the path with leading slash removed, and $test is the equality of the query to the string 'test'.
$string = trim($components['path'], '/');
$test = isset($components['query']) && $components['query'] == 'test';

You're including the ? in catch expression.
The regex you should use: \/?(\w+)\??(\w+)?
This will use less number of steps (46 compared to 56 of your regex) hence less load on the server too.
Demo: https://regex101.com/r/98QNeh/2

Related

pcre. How to get last part of url

Have a url kind of this
https://example.com.ua/part1/part2/part3/product-123.html
How to get product-123.html with regular expression?
I tried this:
echo
preg_replace('#/([a-zA-Z0-9_-]+\.html)$#','$1','https://example.com.ua/part1/part2/part3/product-123.html');
You don't need regular expression for that
There's more than one way of doing it.
Look into parse_url().
parse_url(): This function parses a URL and returns an associative array containing any of the various components of the URL that are present.
That will get you most of the way there and will also separate the host for you. Then you just explode your way to the last part using explode() and end().
$url = parse_url('http://example.com/project/controller/action/param1/param2');
$url['last'] = end(explode('/', $url[path]));
Array
(
[scheme] => http
[host] => example.com
[path] => /project/controller/action/param1/param2
[last] => param2
)
Or you can go straight to the point like this:
$last = ltrim(strrchr(parse_url($url, PHP_URL_PATH), '/'), '/');
You can also just go a head and use explode() in combination of end() directly on the URL. (it's also a lot shorter if you don't need the extra information of parse_url)
$last = end(explode('/', $url));
You can also just use basename() like this
$url = "http://example.com/project/controller/action/param1/param2";
$last = basename($url);
// Output: param2
The preg_replace only replaces what you have it find. In this case product-123.html. So you're replacing /product-123.html with product-123.html and the https://example.com.ua/part1/part2/part3 remains untouched.
To replace everything and only keep the match you'd do
echo
preg_replace('#.*/([a-zA-Z0-9_-]+\.html)$#','$1','https://example.com.ua/part1/part2/part3/product-123.html');
you don't need a regex though to accomplish this task, and if you did it'd probably be cleaner to use preg_match.
Here's a preg_match approach:
preg_match('#[a-zA-Z0-9_-]+\.html$#', 'https://example.com.ua/part1/part2/part3/product-123.html', $match);
echo $match[0];
Demo: https://3v4l.org/4o9RM
Regex demo: https://regex101.com/r/6dytu0/2/
Why regex?
$str ="https://example.com.ua/part1/part2/part3/product-123.html";
Echo Substr($str, strrpos($str, "/")+1);
https://3v4l.org/rJiGL
Strrpos finds the last / and returns position.
Here is a preg_replace that will work if you must use regex.
https://regex101.com/r/6zJwBo/1
$re = '/.*\//';
$str = 'https://example.com.ua/part1/part2/part3/product-123.html';
$subst = '';
$result = preg_replace($re, $subst, $str);

Is is possible to know the position of a match in a subject string

I have a file name where information has to be replaced. Here is a subject sample :
FileA-2014-11-01_K_1_A2_383.xxx
As many files are to be processed, this filename is first matched by a regex, say :
/[a-zA-Z]*-\d{4}-\d{2}-\d{2}_(\w)_(\d)_A2_(\d*)\.xxx$/
This regex will give me, using preg_match, the values to be replaced, here :
K=>A
1=>2
383=>666
My first try was to naively use "str_replace", but it fails when patterns are repeated in the string : here i will get :
FileA-2024-22-02_A_2_A2_666.xxx
So the date is also modified by the str_replace (as it was told to do..)
So, i wonder if there is a way to know where is a given match in the string to have a clean replacement.
I'm now trying to revert the regex to be able to capture non-replacement blocks, and then insert replaced data. That regex would be :
/([a-zA-Z]*-\d{4}-\d{2}-\d{2}_)\w(_)\d(_A2_)\d*(\.xxx)$/
With that one, i'm able to keep non-replaced parts. I now have to find a kind of index to know the replacement position in the string. I guess I can achieve this way, but is seems somewhat complicated and error prone.
Given I only have the initial regex and the map for to=>from replacement, is there a way to do that in a better way?
[EDIT : solution]
<?php
$filename = "FileA-2014-11-01_K_1_A2_383.xxx";
$expected = "FileA-2014-11-01_A_2_A2_666.xxx";
$regex = "/[a-zA-Z]*-\d{4}-\d{2}-\d{2}_(\w)_(\d)_A2_(\d*)\.xxx$/";
global $replacements;
$replacements["K"] = "A";
$replacements["1"] = "2";
$replacements["383"] = "666";
$result = preg_replace_callback($regex, function($matches){
global $replacements;
print_r($matches);
// ended here. no way.
}, $filename);
if(strcmp($result,$expected)==0)
echo "preg_replace_callback() : Yep\n";
else
echo "preg_replace_callback() : Nop\n";
preg_match($regex, $filename, $matches, PREG_OFFSET_CAPTURE);
// remove useless global string match
array_shift($matches);
$result = $filename;
foreach($matches as $matchInfo){
$match = $matchInfo[0];
$position = $matchInfo[1];
$matchLength= strlen($match);
$beforeReplacementPart = substr($result, 0, $position);
$afterReplacementPart = substr($result, ($position + $matchLength));
$result = $beforeReplacementPart . $replacements[$match] . $afterReplacementPart;
}
if(strcmp($result,$expected)==0)
echo "preg_match() and substr game : Yep\n";
else
echo "preg_match() and substr game : Nop\n";
A regex that matches that filename:
$re = '/[a-zA-Z]*-\d{4}-\d{2}-\d{2}_(\w)_(\d)_A2_(\d*)\.xxx$/';
$str = 'FileA-2014-11-01_K_1_A2_383.xxx';
If you add PREG_OFFSET_CAPTURE as the fourth parameter ($flags) to the call to preg_match(), it will also return the offset of each captured string in the third parameter:
preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE);
A print_r($matches) will reveal:
Array
(
[0] => Array
(
[0] => FileA-2014-11-01_K_1_A2_383.xxx
[1] => 0
)
[1] => Array
(
[0] => K
[1] => 17
)
[2] => Array
(
[0] => 1
[1] => 19
)
[3] => Array
(
[0] => 383
[1] => 24
)
)
$matches[0] is the part that matched the entire regex. $matches[1] is the first capturing sub-expression, $matches[2] is the second and so on.
$matches[1][0] is the fragment from the input string that matched the first regex sub-expression (\w) and $matches[1][1] is the offset in the input string where it was found. The same for $matches[N][0] and $matches[N][1] for the Nth sub-expression.
If you need to do a simple replacement then you don't need to bother about offsets but use preg_replace() or, if the replacement expression is complex or dynamic, preg_replace_callback().
Using preg_replace() you need to capture the parts you want to keep:
$re = '/([a-zA-Z]*-\d{4}-\d{2}-\d{2}_)\w_\d_A2_\d*(\.xxx)$/';
$str = 'FileA-2014-11-01_K_1_A2_383.xxx';
$new = preg_replace($re, '$1A_2_A2_666$2', $str);
echo($new."\n");
In the replacement string, $1 and $2 denote the sub-expressions from the regex. We marked them for capturing in order to re-use them in the replacement string.
At least preg_match_all() offers the option
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.
You could try the below regex.
([a-zA-Z]*-\d{4}-\d{2}-\d{2}(?:-\d*)?_)\w_\d(_A2)_\d*(\.xxx)$
Then replace the match with
\1A_2\2_666\3
DEMO
$re = "~([a-zA-Z]*-\\d{4}-\\d{2}-\\d{2}(?:-\\d*)?_)\\w_\\d(_A2)_\\d*(\\.xxx)$~m";
$str = "FileA-2014-11-01_K_1_A2_383.xxx";
$subst = "\1A_2\2_666\3";
$result = preg_replace($re, $subst, $str);
You can use:
$re = "/([a-zA-Z]+-\\d{4}-\\d{2}-\\d{2}_)\\w+_\\d+(_A2_)\\d+(\\.xxx)$/m";
$str = "FileA-2014-11-01_K_1_A2_383.xxx";
$subst = "${1}A_2${2}666${3}";
$result = preg_replace($re, $subst, $str);
//=> FileA-2014-11-01_A_2_A2_666.xxx
RegEx Demo
Perhaps it is possible to use this in your case:
$str = strtr($str, array('_K_1_'=>'_A_2_', '_383.'=>'_666.'));
or
$str = str_replace('_K_1_A2_383.xxx', '_A_2_A2_666.xxx', $str);
So there is no more ambiguity and the replacement is fast.

Remove the end of a string with a varying string length using PHP

Using PHP I have an array that return this data for an image:
Array
(
[0] => http://website.dev/2014/05/my-file-name-here-710x557.png
[1] => 710
[2] => 557
[3] => 1
)
Based on the demo data above, I need to somehow turn this image URL into:
http://website.dev/2014/05/my-file-name-here.png removing the -710x557 from the string.
Some things to keep in mind are:
The file extension can change and be any type of file type
710x557 might not ALWAYS be a 3 digit x 3 digit number. It could be 2 or 4
The reason I mention this is to show I cannot simply use PHP's string functions to remove the last 12 characters in the string and then add the file extension back because the last string characters could possibly be between 10 and 14 characters long sometimes and not always 12.
I was hoping to avoid a heavy regular expression code but if that is the only or best way here then I say go with it.
How do I write a regex that removes the end of a string that could have a varying length in PHP?
You can use a regex like this:
-\d+x\d+(\.\w+)$
Working demo
The code you can use is:
$re = "/-\\d+x\\d+(\\.\\w+)$/";
$str = "http://website.dev/2014/05/my-file-name-here-710x557.png";
$subst = '\1';
$result = preg_replace($re, $subst, $str, 1);
The idea is to match the resolution -NumbersXNumbers using -\d+x\d+ (that we'll get rid of it) and then capture the file extension by using (\.\w+)$ using capturing group. Check the substitution section above.
As long as it is 2 sets of digits with an 'x' in the middle preceded by a dash you can use this regex:
-[\d]*x[\d]*
$string = 'http://website.dev/2014/05/my-file-name-here-710x557.png';
$pattern = '/-[\d]*x[\d]*/';
$replacement = '';
echo preg_replace($pattern, $replacement, $string);
http://phpfiddle.org/lite/code/eh40-6d1x
You can probably use strrpos in the following manner to do this:
$str = substr($str, 0, strrpos($str, '-')) . substr($str, strrpos($str, '.'));
You can use this regex based code:
$str = "http://website.dev/2014/05/my-file-name-here-710x557.png";
$re = '/-([^-]+)(?=\.[^-]*$)/';
$result = preg_replace($re, '', $str, 1);
//=> http://website.dev/2014/05/my-file-name-here.png
RegEx Demo
$newsrc = preg_replace('#\-\d+x\d+(\.\w+$)#', '$1', $arr[0]);
see http://ideone.com/ZsELQ0

php: split string until first occurance of a number

i have string like
cream 100G
sup 5mg Children
i want to split it before the first occurrence of a digit. so the result should be
array(
array('cream','100G'),
array('sup','5mg Children')
);
can so one tell me how to create pattern for this ?
i tried
list($before, $after) = array_filter(array_map('trim',
preg_split('/\b(\d+)\b/', $t->formula)), 'strlen');
but something went wrong.
Try this:
<?php
$first_string = "abc2 2mg";
print_r( preg_split('/(?=\d)/', $first_string, 2));
?>
Will output:
Array ( [0] => abc [1] => 2 2mg )
The regular expression solution would be to call preg_split as
preg_split('/(?=\d)/', $t->formula, 2)
The main point here is that you do not consume the digit used as the split delimiter by using positive lookahead instead of capturing it (so that it remains in $after) and that we ensure the split produces no more than two pieces by using the third argument.
You don't need regular expressions for that:
$str = 'cream 100g';
$p = strcspn($str, '0123456789');
$before = substr($str, 0, $p);
$after = substr($str, $p);
echo "before: $before, after: $after";
See also: strcspn()
Returns the length of the initial segment of $str which does not contain any of the characters in '0123456789', aka digits.

Regex For Get Last URL

I have:
stackoverflow.com/.../link/Eee_666/9_uUU/66_99U
What regex for /Eee_666/9_uUU/66_99U?
Eee_666, 9_uUU, and 66_99U is a random value
How can I solve it?
As simple as that:
$link = "stackoverflow.com/.../link/Eee_666/9_uUU/66_99U";
$regex = '~link/([^/]+)/([^/]+)/([^/]+)~';
# captures anything that is not a / in three different groups
preg_match_all($regex, $link, $matches);
print_r($matches);
Be aware though that it eats up any character expect the / (including newlines), so you either want to exclude other characters as well or feed the engine only strings with your format.
See a demo on regex101.com.
You can use \K here to makei more thorough.
stackoverflow\.com/.*?/link/\K([^/\s]+)/([^/\s]+)/([^/\s]+)
See demo.
https://regex101.com/r/jC8mZ4/2
In the case you don't how the length of the String:
$string = stackoverflow.com/.../link/Eee_666/9_uUU/66_99U
$regexp = ([^\/]+$)
result:
group1 = 66_99U
be careful it may also capture the end line caracter
For this kind of requirement, it's simpler to use preg_split combined with array_slice:
$url = 'stackoverflow.com/.../link/Eee_666/9_uUU/66_99U';
$elem = array_slice(preg_split('~/~', $url), -3);
print_r($elem);
Output:
Array
(
[0] => Eee_666
[1] => 9_uUU
[2] => 66_99U
)

Categories