preg_match within braces with optional existence additional content within braces sometimes - php

i have data like so
$data = '<span class="theclass">data (not important)</span> <span class="anotherclass">extra data (October 1, 2010)</span>';
i want to get the date within the braces so ive done the following preg_match
preg_match("/\((([a-zA-Z]{5,10} .*?)|(\d{4}))\)/i",$data,$res);
please not that sometimes 'October 1' is not present BUT THE YEAR IS ALWAYS PRESENT hence the OR condition.... the thing is it gives me array of 3 in this case, i know its because of the set of 3 braces i have for each condition , is there any other better and cleaner way to achieve this ?
2nd condition method
$data = <span class="theclass">data</span> <span class="theother">data data (2009)</span>
</h3>
Thanks guys

Use lookarounds
Here we're making sure there is a preceding ( character, then we look for text we would see in a date formatted like your example. This little bit of code says ALLOW for alpha numeric characters, a literal space character, and a comma, as well as digits ([A-Za-z ,\d]+)?. The + character means at least 1. It's not as greedy as .* or .+. I'm surrounding it with parenthesis and then adding a ? character to make it not required. It works similar to your | or statement logically because it will still find the year, but we're not making PHP do more work by parsing another check. Then we find the year (always 4 digits {4}). Then we check to make sure it's followed by a literal ) character. The look behind (?<=\() and the look ahead (?=\)) will find a match, but they are not included in the match results, leaving your answer clean.
Since preg_match() returns an array() we're catching the first element in the array. If you're looking for multiple matches in the same string you can use preg_match_all.
$data = '<a href="not important">
<span class="theclass">data (not important)</span></a>
<span class="anotherclass">extra data (October 1, 2010)</span>
<span class="anotherclass">extra data (2011)</span>';
$pattern = '!(?<=\()([A-Za-z ,\d]+)?[\d]{4}(?=\))!';
$res = preg_match_all($pattern,$data,$myDate);
print_r($myDate[0]);
output
Array
(
[0] => October 1, 2010
[1] => 2011
)
If you're only looking for one match you would change the code to this:
$res = preg_match($pattern,$data,$myDate);
echo($myDate[0]);
Output
October 1, 2010
Another way to write the pattern would be like this... we've removed the parenthesis (grouping) and the plus + modifier followed by the conditional ?, but left the first set. Then we're using a * to make it conditional. The difference is with preg_match and preg_match_all, any groupings are also stored in the array. Since this isn't a group, then it will not store extra array elements.
$pattern = '!(?<=\()[A-Za-z ,\d]*[\d]{4}(?=\))!';

Related

regex to convert string 018v-s001v => 18v-s1v but 020v_001 => 20v_001

I'm struggling with a Regex to convert the following strings
018v-s001v => 18v-s1v
018v-s001r => 18v-s1r
018r-s002v => 18r-s2v
020v_001 => 20v_001
020r_002 => 20r_002
0001 => 0001
I could manage to convert the first three cases but I'm struggling with the latter three: How to preserve the zeros after_ and the all zeros in the last case?
My attempt: (0*)([1-9]{0,4}[vr]?)((-s)?+([0]{0,2}))?+([1-9][vr])?
https://regex101.com/r/2go5KO/1
For your given examples, you could use
000\d+(*SKIP)(*FAIL)|(?<=\b|[a-z])0+
See a demo on regex101.com.
To get the expected result for your example data you might use preg_replace.
You could match one or more times a zero 0+, capture in a group one or more digits and use a character class to match by v or r ([0-9]+[vr])
Regex
0+([0-9]+[vr])
Replace
Captured group 1 $1
Demo Php
How about this one:
$result = preg_replace('/(?:(\d{4})|(0)?(\d{2}\w))(?:([-_])(?:(\d{3})|(\w)(0+)(\d+?\w)))?/m',
'$1$3$4$5$6$8', $subject);
This produces all the results you require from your test strings. But it wasn't clear where a zero definitely will appear or only optionally. But I'm sure it can be adapted. Also I noticed the separator was occasionally a hyphen - and occasionally an underscore _ and it wasn't clear if that was just your typing or was significant. In any case I've assumed it could be either somewhat randomly.

Php preg_replace numbers characters

$my_string = '88888805';
echo preg_replace("/(^.|.$)(*SKIP)(*F)|(.)/","*",$,my_string);
This shows the first and last number like thus 8******5
But how can i show this number like this 888888**. (The last 2 number is hidden)
Thank you!
From this: 8******5
To: 888888**
I'm not sure if you have worked on this Regex pattern to do something unique. However, I will provide you with a general one that should fit your question without using your current pattern.
$my_string = '88888805';
echo preg_replace("/([0-9]+)[0-9]{2}$/","$1**",$,my_string);
Explanation:
The ([0-9]+) will match all digits, this could be replaced with \d+, it's between brackets to be captured as we are going to use it in the results.
[0-9]{2} is going to match the last 2 digits, again, it can be replaced with \d{2}, it's outside the brackets because we don't want to include them in the result. the $ after that is to indicate the end of the test, it's optional anyways.
Results:
Input: 88888805
Output: 888888**
echo preg_replace("/(.{2}$)(*SKIP)(*F)|(.)/","*",$my_string);
If it for a uni assignment, you'd probably want to do this. Basically says, don't match if its the last two characters, otherwise match.

PHP Regex to identify keys in array representation

I have this string authors[0][system:id] and I need a regex that returns:
array('authors', '0', 'system:id')
Any ideas?
Thanks.
Just use PHP's preg_split(), which returns an array of elements similarly to explode() but with RegEx.
Split the string on [ or ] and the remove the last element (which is an empty string) of the provided array, $tokens.
EDIT: Also, remove the 3rd element with array_splice($array, int $offset, int $lenth), since this item is also an empty string.
The regex /[\[\]]/ just means match any [ or ] character
$string = "authors[0][system:id]";
$tokens = preg_split("/[\]\[]/", $string);
array_pop($tokens);
array_splice($tokens, 2, 1);
//rest of your code using $tokens
Here is the format of $tokens after this has run:
Array ( [0] => authors [1] => 0 [2] => system:id )
Taking the most simplistic approach, we would just match the three individual parts. So first of all we'd look for the token that is not enclosed in brackets:
[a-z]+
Then we'd look for the brackets and the value in between:
\[[^\]]+\]
And then we'd repeat the second step.
You'd also need to add capture groups () to extract the actual values that you want.
So when you put it all together you get something like:
([a-z]+)\[([^\]]+)\]\[([^\]]+)\]
That expression could then be used with preg_match() and the values you want would be extracted into the referenced array passed to the third argument (like this). But you'll notice the above expression is quite a difficult-to-read collection of punctuation, and also that the resulting array has an extra element on it that we don't want - preg_match() places the whole matched string into the first index of the output array. We're close, but it's not ideal.
However, as #AlienHoboken correctly points out and almost correctly implements, a simpler solution would be to split the string up based on the position of the brackets. First let's take a look at the expression we'd need (or at least, the one that I would use):
(?:\[|\])+
This looks for at least one occurence of either [ or ] and uses that block as delimiter for the split. This seems like exactly what we need, except when we run it we'll find we have a small issue:
array('authors', '0', 'system:id', '')
Where did that extra empty string come from? Well, the last character of the input string matches you delimiter expression, so it's treated as a split position - with the result that an empty string gets appended to the results.
This is quite a common issue when splitting based on a regular expression, and luckily PCRE knows this and provides a simple way to avoid it: the PREG_SPLIT_NO_EMPTY flag.
So when we do this:
$str = 'authors[0][system:id]';
$expr = '/(?:\[|\])+/';
$result = preg_split($expr, $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
...you will see the result you want.
See it working

Regular expression help needed

Although I can find a lot of tutorials on regular expressions, it remains above my grasp. The regular expression that I want to create is simple (judged by what I see in some of the examples), but I simply can not figure it out.
I want to do a simple replacement as follows:
I have image metadata saved in a MySQL table, with fields: id, name, title and alt.
In my content, I want to write [[IMAGE:1:right]]content here[[image:2:left]].
I want to get the matches of the ID (the digit) and the float (left or right) and replace the entire string with the image floated left or right, retrieved by the ID from the database table.
Here is my attempt:
preg_match("/^\[\[image:(\d+):(left|right)\]\]+/i", "[[IMAGE:1:right]]content here[[image:2:left]]", $matches);
This gives me the return of:
Array ( [0] => [[IMAGE:1:right]] [1] => 1 [2] => right )
So, it finds one, but I want it to find ALL of them, as I may have more than one image in a post. As far as I can tell, the + there should match all entries, and the i should match case insensitive. It appears as if the case insensitive way works, but I get only one return.
Could someone please let me know what I am doing wrong?
That's not quite how it works. That + only applies to the token immediately before it - the ]. You want to make the match global in Perl vernacular, which for PHP (which I think you're using?) means calling the function preg_match_all(). You'll also have to remove the ^, as only one of the images occurs at the beginning of the string.
Also, [ and ] are special characters in regex - so please escape them when you want a literal bracket by writing \[\[ and \]\].

Use String for Pattern but Exclude it from Being Removed

i'm pretty new on regex, i have learned something by the way, but is still pour knowledge!
so i want ask you for clarification on how it work!
assuming i have the following strings, as you can see they can be formatted little different way one from another but they are very similar!
DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE
now i want replace everything between the first A-Z block and the colon so for example i would keep
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
so on my very noobs knowledge i have worked out this shitty regex! :-(
preg_replace( '/^[A-Z](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
but why i'm sure this regex will not work!? :-)
Pls help me!
PS: the title of question is pretty explaned, i want also know how for example use a well know string block for match another...
preg_replace( '/^[DTSTART](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
..without delete DTSTART
Thanks for the time!
Regards
Luca Filosofi
You could use a relatively simple regex like the following.
$subject = 'DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE';
echo preg_replace('/^[A-Z]+\K[^:\n]*/m', '', $subject) . PHP_EOL;
It looks for a series of capital letters at the start of a line, resets the match starting point (that's what \K does) to the end of those and matches anything not a colon or newline (i.e. the parts you want to remove). Those matched parts are then replaced with an empty string.
The output from the above would be
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
If the lines that you are interested in will only ever start with DTSTART or DTEND then we could be more precise about what to match (e.g. ^DT(?:START|END)) but [A-Z] obviously covers both of those.
If you want to retain part of the matched pattern in a substitution, you put parentheses around it and then refer to it by $1 (or whichever grouping it is).
For example:
s/^(this is a sentence) to edit/$1/
gives "this is a sentence"
You can check out this example work similarly as your problem
\w+): (?P\d+)/', $str, $matches);
/* This also works in PHP 5.2.2 (PCRE 7.0) and later, however
* the above form is recommended for backwards compatibility */
// preg_match('/(?\w+): (?\d+)/', $str, $matches);
print_r($matches);
?>
The above example will output:
Array
(
[0] => foobar: 2008
[name] => foobar
[1] => foobar
[digit] => 2008
[2] => 2008
)
so if u need only digit u need to print $matches[digit]
You want to remove everything between a semicolon and either a colon or the end of the line, right? So use that as your expression. You're overcomplicating things.
preg_replace('/(?:;.+?:)|(?:;.+?$)/m','',$data);
It's a pretty simple expression. Either match (?:;.+?:) or (?:;.+?$), which differ only by their terminator (the first one matches up to a colon, the second one matches up to the end of the line).
Each is a non-capturing group that starts with a semicolon, reluctantly reads in all characters, then stops at the terminator. Everything matched by this is removable according to your description.

Categories