Regex Expression for strings which look like javascript objects in PHP - php

We have strings in php like the following two examples:
{{'LANGUAGE_ID','String inclusive special chars (,/)'}}
{{'LANGUAGE_ID','String inclusive special chars (,/)','Another string inclusive special chars (,/)'}}
The strings are always surrounded by {{ and }}. Inside we have multiple elements separated by a comma and surrounded by single quotes. The first element is always a word \w. After that we have a unknown number of elements which can be a word or sentence including special characters. What we want to get is the content (text between single quotes) for each element.
We have a solution as long as we know how many elements the string contains.
Solution for 1. example: {{'([\w]+)','([^\n\r']+)'}}
Solution for 2. example: {{'([\w]+)','([^\n\r']+)','([^\n\r']+)'}}
We are looking for a solution which works for both examples or even a example with three or more elements.
We have a regex share to play around here:
http://regexr.com/3c58c

You can use this regex using \G:
preg_match_all('/(?:{{|\G,)'([^']+)'(?=.*?}})/', $text, $matches);
print_r($matches);
RegEx Demo

How about this one:
{{'([\w]+)',('([^\n\r']+)',*)*}}

Related

php preg_match mismatch

I would like to know why preg_match('/(?<=\s)[^,]+(?=\s)/',$data,$matches);
matches "List Processes 8989" in the string "20180513 List Processes 8989". The regex I am using should not match numeric characters. What is wrong?
The [^,] basically means any character except ,. If you want to exclude numeric characters as well, you can replace it with [^,0-9], or better [^,\d], so your regex would look like this:
(?<=\s)[^,\d]+(?=\s)
Try it online.
I'm assuming the input string in your question is only part of the actual input string you're using because the regex you provided won't match the numbers at the end unless they're followed by a whitespace.
References:
Negated Character Classes.
Difference between [0-9] and \d.

Looking to use preg_replace to remove characters from my strings

I have the right function, just not finding the right regex pattern to remove (ID:999999) from the string. This ID value varies but is all numeric. I like to remove everything including the brackets.
$string = "This is the value I would like removed. (ID:17937)";
$string = preg_replace('#(ID:['0-9']?)#si', "", $string);
Regex is not more forte! And need help with this one.
Try this:
$string = preg_replace('# \(ID:[0-9]+\)#si', "", $string);
You need to escape the parenthesis using backslashes \.
You shouldn't use quotes around the number range.
You should use + (one or more) instead of ? (zero or one).
You can add a space at the start, to avoid having a space at the end of the resulting string.
In PHP regex is in / and not #, after that, parentheses are for capture group so you must escape them to match them.
Also to use preg_replace replacement you will need to use capture group so in your case /(\(ID:[0-9]+\))/si will be the a nice regular expression.
Here are two options:
Code: (Demo)
$string = "This is the value I would like removed. (ID:17937)";
var_export(preg_replace('/ \(ID:\d+\)/',"",$string));
echo "\n\n";
var_export(strstr($string,' (ID:',true));
Output: (I used var_export() to show that the technique is "clean" and gives no trailing whitespaces)
'This is the value I would like removed.'
'This is the value I would like removed.'
Some points:
Regex is a better / more flexible solution if your ID substring can exist anywhere in the string.
Your regex pattern doesn't need a character class if you use the shorthand range character \d.
Regex generally speaking should only be used when standard string function will not suffice or when it is proven to be more efficient for a specific case.
If your ID substring always occurs at the end of the string, strstr() is an elegant/perfect function.
Both of my methods write a (space) before ID to make the output clean.
You don't need either s or i modifiers on your pattern, because s only matters if you use a . (dot) and your ID is probably always uppercase so you don't need a case-insensitive search.

Regex replace format template with a string so that the result is a formatted string

Basically I have the following:
input string (eg ABCDEFGHI)
input template (eg XXX-XXX-XXX)
and the output I want to see is ABC-DEF-GHI
I imagine it going something like "XXX-XXX-XXX".replace("regex", "ABCDEFGHI");
The catch is that the template is dynamic. It may be XXX-XXX-XXX or XX-XXXX-XXX or any other combination that can include any special character but the charater to match is always X.
The template is not limited in length or number of groups separated by special characters.
i.e. XX-X-X and XXX-XXX-XXX-XXX-X are both valid templates as long as there are the same number of X's as input characters.
So far I have this: "/^([^a-zA-Z0-9]*X){9}[a-zA-Z0-9]*$/" which will validate my template.
Can anyone shed some light on this? Is there a way to replace one matched character from the template with one character from the string?
Basically you'll need to convert your simple template into REGEX and you can do this with REGEX:
I. Create the replacement:
search regex ^X+(\W)X+(\W)X+$ replace with \\1\1\\2\2\\3
this will transform XXX-XXX-XXX into \1-\2-\3 explained demo here
II. Create the match: (two steps)
Create the three groups:
search regex ^(X+)\W(X+)\W(X+)$ replace with (\1)(\2)(\3)
this will transform XXX-XXX-XXX into (XXX)(XXX)(XXX) explained demo here
Replace the X's with dots (. acts as a single character match), over the above operation
this will transform (XXX)(XXX)(XXX) into (...)(...)(...) explained demo here
Now you can use your new Match string (...)(...)(...) and new Replacement string \1-\2-\3 with the input string ABCDEFGHI and get ABC-DEF-GHI explained demo here
Notice: I'm assuming your template will split the input string into 3 parts with 2 (variable) special characters in between
Update:
If the template has variable parts you have to create you match and replacement patterns in advance:
Use the regex: \W to count the parts in the template, then create your match and replacement patterns.

RegEx match only strings that are not starting with quotation mark

I have the list of strings (in PHP):
a2c
bdR
dDv
"ddv
aaa
"aaa
What's the RegEx expression to match only the strings that are not starting with quotation mark? In this example there are four such strings. So, I need to match four strings only (to count them). For this list I'am using a loop, but I just need RegEx now. Thanks!
I tried with
[^"]([a-zA-Z0-9]*)*
but it still matching all strings even those that starting with quotation mark.
You are missing the start of string anchor ^, which means that your expression will match a string if it appears at any place inside it. Obviously the sequence "non-quote followed by anything, including end of string" appears inside all of your sample inputs.
This expression will match what you want:
^[^"]
It simply matches any input whose first character is not a double quote. There is no need to bother with the rest of the characters.
Try it:
^[^"](.*)/gm
(If more than one string is in the variable.)
$strigs_arr = array();
foreach($strings as $str){
//$str = '"aaa';
preg_match('/^(?P<string>[^"].*)/',$str,$match);
$strigs_arr[] = $match['string'];
}
echo "<pre>";
print_r($strigs_arr);

Splitting large strings into words in php

I have a long string in php consisting of different paragraphs each of which with different sentences (it is pretty much a small document). I want to split the whole thing into words by removing any symbols/characters that are not relevant. For example remove commas, spaces, new lines, full stops, exclamation marks and anything that might be irrelevant so as to end up with only words.
Is there an easy way of doing this in one go, for example by using a regular expression and the preg_split function or do I have to use the explode function a number of times: eg first get all the sentences (by removing '.', '!' etc). Then get words by removing ',' and spaces etc etc.
I would not like to use the explode function on all the possible characters that are irrelevant since it is time consuming and I may accidentally omit some of all those possible characters.
I would like to find a more automatic way. I think a well define regular expression might do the work but again I will need to specify all the possible characters and also I have no idea of how to write regular expressions in php.
So what can you suggest to me ?
Do you want to remove punctuation characters, etc and then split the words into an array? Or just strip it so there are only letters and spaces? Not exactly sure what you're trying to achieve, but the following might help:
<?php
$string = "This is a sentence! It has *lots* of #$#king random non-word characters. Wouldn't you like to strip them?";
$words = preg_replace("/[^\w\ _]+/", '', $string); // strip all punctuation characters, news lines, etc.
$words = preg_split("/\s+/", $words); // split by left over spaces
var_dump($words);
Either way, it gives you the general idea of using regular expressions to manipulate text as needed. My example has two parts, this way words like "wouldn't" aren't split into two words like other answers have suggested.
To be unicode compatible, you should use this one:
preg_split('/\PL+/u', $string, -1, PREG_SPLIT_NO_EMPTY);
wich splits on characters that are not letter.
Have a look at here to see the unicode character properties.
Just use preg_replace() and define a regular expression to match on the different characters you wish to replace and provide a replacement character to replace them with.
http://php.net/manual/en/function.preg-replace.php
For the characters you wish to search on you can define those in a PHP array as seen in the PHP manual.
Your answer is in the domain of regular expressions and would probably be very difficult to get right. You could get something that works well in almost all cases but there would be exceptions.
This might help:
http://www.regular-expressions.info/wordboundaries.html

Categories