Most reliable way to extract strings between two delimiters

Most reliable way to extract strings between two delimiters - php

I've tried multiple functions to extract whatever between two strings, The delimiters might contain special characters, I guess that's why none worked for me.
My current function:
function between($str, $startTag, $endTag){
$delimiter = '#';
$regex = $delimiter . preg_quote($startTag, $delimiter)
. '(.*?)'
. preg_quote($endTag, $delimiter)
. $delimiter
. 's';
preg_match($regex, $str, $matches);
return $matches;
}
Example of string:
#{ST#RT}#
Text i want
#{END}#
#{ST#RT}#
Second text i want
#{END}#
How to improve that or suggest another solution to:
Support any kind of character or new lines
Extract multiple strings if found
Current Behavior: Only returns the first match, And also returns the match plus the surrounding tags which is unwanted

Use the m option for multi-line regular expressions (it allows the . character to match newlines):
preg_match('/foo.+bar/m', $str);
// ^--- this
Use preg_match_all() to get your multiple strings:
preg_match_all($regex, $str, $matches);
return $matches[1]; // an array of the strings
Edit:
The reason your current code returns the match plus the surrounding tags is because you're using return $matches. The $matches array has several elements in it. Index 0 is always the entire string that matched the expression. Indexes 1 and higher are your capture groups. In your expression, you had only one capture group (the "string"), so you would have wanted to only do return $matches[1] instead of return $matches.

You can use preg_match_all to extract multiple strings, besides that your code seems simple enough, normally simpler is faster.

Related

Regular expression return only certain values PHP

I cant remember what to use to return only a specific part of a string.
I have a string like this:-
$str = "return(me or not?)";
I want to get the word which is after (. In this example me will be my result. How can I do this?
I dont think substr is what I am looking for. as substr returns value based on the index you provided. which in this case i dont know the index, it can vary. All I know is that I want to return whatever is after "(" and before the space " ". The index positions will always be different there for i cant use substr(..).

This regular expression should do the trick. Since you didn't provide general rules but only an example it might need further changes though.
preg_match('/\((\S+)/', $input, $matches);
$matches[1] contains "me" then.

<?php
// Your input string
$string = "return(me or not?)";
// Pattern explanation:
// \( -- Match opening parentheses
// ([^\s]+) -- Capture at least one character that is not whitespace.
if (preg_match('/\(([^\s]+)/', $string, $matches) === 1)
// preg_match() returns 1 on success.
echo "Substring: {$matches[1]}";
else
// No match found, or bad regular expression.
echo 'No match found';

Result of capture group will be your result using this regex and preg_match().
$regex = '/\((\w+)/';
Check preg_match() for the working reference.

Trouble Using the preg_replace Option

I have a regex:
preg_match_all('#^(((?:-?>?(?:[A-Z]{3})?\d{3})+)-([0-9]{2})([0-9]{2})([0-9]{2})-\n\/O.([A-Z]{3}).KCLE.([A-Z]{2}).([A-Z]).([0-9]{4}).[0-9]{6}T[0-9]{4}Z-([0-9]{2})([0-9]{4}T[0-9]{4}Z[\/]))#', '', $matches)
that runs against a string(s) on a webpage. An example of a possible string:
OHZ012>018-PAZ015-060815-
/O.EXP.KCLE.BH.S.0015.000000T0000Z-170806T0700Z/
This will correctly match the string. However, for $matches[2] it will output
OHZ012>018-PAZ015
I want this line to read: 012>018-015 (i.e. remove the letters from that group).
I have tried the following using preg_replace:
$matches = preg_replace('/([A-Z]{3})/','',$matches);
Now if I print out $matches[2] it just gives me the 3rd character as opposed to the group. So for example, it will print out "2" instead of "012>018-015". Any idea why it isn't printing out the entire group as I would expect?

preg_match_all populates your $matches variable with an array of arrays. The third parameter of preg_replace should be either a string or an array of strings, so that is probably where you were running into the issue.
$matches[2], however, is an array of strings, so you can call preg_replace passing it as the third parameter and get your results.
$matches[2] = preg_replace('/([A-Z]{3})/','',$matches[2]);
If you would like a more generic letter replacement regex, you can use /[A-Z]/i to remove all letters in the strings.

preg_replace - similar patterns

I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.

\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv

You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...

Match one string against another string that contains regex

I have two strings:
$relativeUrl = "/string1/(\d+)/string2/(\d+)/string3";
$currentUrl = "/string1/1234/string2/5678/string3";
I am trying to assert that the $currentUrl does indeed match $relativeUrl which contains regexes for two numbers ((\d+)). The result I want is simply a true or false value.
I have tried using preg_quote() because:
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax. This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
However, I have to [pre|ap]pend the delimiters in order to not receive the Unknown modifier '\' error and even then, I just get an empty array back. I'm not sure how else is best to go about this.
preg_match("^" . preg_quote($relativeUrl) . "^", $currentUrl, $matches);
var_dump($matches); // Gives array(0) { }
Am I going about this the right way? How can I achieve what I require?

Try this preg_match without preg_quote:
preg_match('#' . $relativeUrl . '#', $currentUrl, $matches);

Convert Notepad++ Regex to PHP Regular Expression

I'm trying to convert a Notepad++ Regex to a PHP regular expression which basically get IDs from a list of URL in this format:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
http://www.example.com/category-example/1471337-text-blah-blah-2-blah-2010.html
Using Notepad++ regex function i get the output that i need in two steps (a list of comma separated IDs)
(.*)/ replace with space
-(.*) replace with comma
Result:
1371937,1471337
I tried to do something similar with PHP preg_replace but i can't figure how to get the correct regex, the below example removes everything except digits but it doesn't work as expected since there can be also numbers that do not belong to ID.
$bb = preg_replace('/[^0-9]+/', ',', $_POST['Text']);
?>
Which is the correct structure?
Thanks

If you are matching against:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
To get:
1371937
You would:
$url = "http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html";
preg_match( "/[^\d]+(\d+)-/", $url, $matches );
$code = $matches[1];
.. which matches all non-numeric characters, then an unbroken string of numbers, until it reaches a '-'

If all you want to do is find the ID, then you should use preg_match, not preg_replace.
You've got lost of options for the pattern, the simplest being:
$url = 'http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html';
preg_match('/\d+/', $url, $matches);
echo $matches[0];
Which simply finds the first bunch of numbers in the URL. This works for the examples.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Most reliable way to extract strings between two delimiters - php

You can use preg_match_all to extract multiple strings, besides that your code seems simple enough, normally simpler is faster.

Related

Regular expression return only certain values PHP

Trouble Using the preg_replace Option

preg_replace - similar patterns

Match one string against another string that contains regex

Convert Notepad++ Regex to PHP Regular Expression

Categories

Resources