Using php and regex, I need to split up a string of data into pieces with the following requirements.
Split it on white space
Ignore whitespace within quotes.
For example:
$string='tid:1212121'; - should just return a single array with "tid:1212121"
$string='tid:1211 topic:ted title:"This Title"'; should return an array with
3 pieces tid:1211, topic:ted and title:"This Title".
I've looked around but my personal regex capabilities are horrible. Also I cannot control the input so the quotes will not be escaped. The string will be as quoted above or longer such as $string='tid:1211 topic:ted title:"This Title" lid:332 fid:"another bit of text"';
Thank you!
If every quoted is preceded by non space characters this could help you:
preg_match_all('/([^ ]*("[^"]*")[\s])|([^ ]+[\s])/', $string, $matches);
You can see in action at : phpliveregex and selecting preg_match_all
Related
I have a multi-line string shown below -
?a="text1
?bc="text23
I need to identify a pattern like using below regex
'/[?][a-z^A-Z]+[=]["]/'
and replace my string by just remove the double quote (") in it, expected output is shown below
?a=text1
?b=text23
Please help in solving the above issue using php.
Capture everything except the quote in a capture group () and replace:
$string = preg_replace('/([?][a-z^A-Z]+[=])["]/', '$1', $string);
But you really don't need all those character classes []:
/(\?[a-z^A-Z]+=)"/
I will give another solution because i see the php tag also. So let's say you have these:
$a='"text1';
$b='"text2';
if i echo them i get
"text1
"text2
in order to get rid of double quote there is a function trim in php that you can use like that:
echo trim($a,'"');
echo trim($b,'"');
the results will be
text1
text2
I dont think you need regex in this occasion as long as you use php. Php can take care of those small things without bother with complex regex expressions.
I am trying to search through text for a specific word and then add a html tag around that word.For example if i had the string "I went to the shop to buy apples and oranges" and wanted to add html bold tags around apples.
The problem, the word i search the string with is stored in a text file and can be uppercase,lowercase etc.When i use preg_replace to do this i manage to replace it correctly adding the tags but for example if i searched for APPLES and the string contained "apples" it would change the formatting from apples to APPLES, i want the format to stay the same.
I have tried using preg_replace but i cant find a way to keep the same word casing.This is what i have:
foreach($keywords as $value)
{
$pattern = "/\b$value\b/i";
$replacement = "<b>$value</b>";
$new_string = preg_replace($pattern, $replacement, $string);
}
So again if $value was APPLES it would change every case format of apples in the $string to uppercase due to $replacemant having $value in it which is "APPLES".
How could i achieve this with the case format staying the same and without having to do multiple loops with different versions of case format?
Thanks
Instead of using $value verbatim in the replacement, you can use the literal strings \0 or $0. Just as \n/$n, for some integer n, refers back to the nth capturing group of parentheses, \0/$0 is expanded to the entire match. Thus, you'd have
foreach ($keywords as $value) {
$new_string = preg_replace("/\\b$value\\b/i", '<b>$0</b>', $string);
}
Note that '<b>$0</b>' uses single quotes. You can get away with double quotes here, because $0 isn't interpreted as a reference to a variable, but I think this is clearer. In general, you have to be careful with using a $ inside a double-quoted string, as you'll often get a reference to an existing variable unless you escape the $ as \$. Similarly, you should escape the backslash in \b inside the double quotes for the pattern; although it doesn't matter in this specific case, in general backslash is a meaningful character within double quotes.
I might have misunderstood your question, but if what you are struggling on is differentiating between upper-case letter (APPLE) and lower-case letter (apple), then the first thing you could do is convert the word into upper-case, or lower-case, and then run the tests to find it and put HTML tags around it. That is just my guess and maybe I completely misunderstood the question.
In the code exists offtopic error: the result value have been rewritten on not first loop iteration. And ending value of $new_string will be only last replacement.
I would like to replace extra spaces (instances of consecutive whitespace characters) with one space, as long as those extra spaces are not in double or single quotes (or any other enclosures I may want to include).
I saw some similar questions, but I could not find a direct response to my needs above. Thank you!
Hope you're still looking, or come back to check! This seems to work for me:
'/\s+((["\']).*?(?=\2)\2)|\s\s+/'
...and replace with $1
EDIT
Also, if you need to allow for escaped quotes like \" or \', you could use this expression:
'/\s+((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/'
It gets a bit stickier if you want to add support for "balanced" quotes like brackets (e.g. () or {})
END EDIT
Let me know if you find problems or would like some explanation!
HOPEFULLY FINAL EDIT AND WARNINGS
Potential problem: If a quoted string starts at the beginning of the string variable (or file), it will either not count as a quoted string (and have any whitespace reduced) or it will throw off the whole thing, making anything NOT in quotes get treated as though it was in quotes and vice versa -
A potential change that might remedy this is to use the following match expression
/(?:^|\s+)((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/
this replaces \s+ with (?:^|\s+) at the beginning of the expression
this will add a space at the beginning of the variable if the string starts with a quote - just trim() or remove that whitespace to continue
I seem to have used the "line by line" approach (like sed, if I'm not mistaken) to reach my original results - if you use the "whole file" or "whole string" setting or approach, carriage-return-line-feed seems to count as two whitespace characters (can't imagine why...), thus turning any newlines into single spaces (unless they are inside quotes and "dot-matches-newline" is used, of course)
this could be resolved by replacing the . and \s shorthand character classes with the specific characters you want to match, like the following:
/(?:^|[ \t]+)((["\'])(\\\\\2|(?!\2)[\s\S])*?(?=\2)\2)|[ \t]{2,}/
this does not require the dot-matches-newline switch and only replaces multiple spaces or tabs - not newlines - with a single space (and of course, only if they are not quoted)
EXAMPLE
This link shows an example of the first expression and last expression in use on sample text on http://codepad.viper-7.com
You could do it in several steps. Consider the following example:
$str = 'This is a string with "Bunch of extra spaces". Leave them "untouched !".';
$id = 0;
$buffer = array();
$str = preg_replace_callback('|".*?"|', function($m) use (&$id, &$buffer) {
$buffer[] = $m[0];
return '__' . $id++;
}, $str);
$str = preg_replace('|\s+|', ' ', $str);
$str = preg_replace_callback('|__(\d+)|', function($m) use ($buffer) {
return $buffer[$m[1]];
}, $str);
echo $str;
This will output the string:
This is a string with "Bunch of extra spaces". Leave them "untouched !".
Although this is is not the prettiest solution.
I have a string:
$str="(94896)content is here(/94896)(94897)content is here(/94897)(94898)content is here(/94898)(94899)content is here(/94899)";
the (number) and (/number) act as tags to take certain content out of the string.
and I have a preg_match to take the content out:
if(preg_match('/(94896)\"(.*)\"(\/94896)/',$str,$c)) {echo "I found the content, its:".$co[1];}
Now for some reason, it doesn't find a match in the string ($str), though its clearly there....
Any ideas on what im doing wrong here?
You need to take the double-quotes out of your regex string, since they don't appear in $str, but are expected by the regex.
'/(94896)\"(.*)\"(\/94896)/'
// ^^ ^^
// These aren't in the string.
EDIT: I think you'll also need to escape your brackets, since they will be getting read as grouping operators, not actual brackets.
Your expression should be:
'/\(94896\)(.*)\(\/94896\)/'
Parentheses are used in a regex to denote subpatterns. If you want to search these characters in a string, you must escape them:
preg_match('/\(94896\)(.*)\(\/94896\)/',$str,$c)
If the pattern is found:
echo "I found the content, its:".$c[0];
Oh, and as Karl Nicoll says, why are the quotations in your pattern?
To match all content:
$str="(94896)content is here(/94896)(94897)content is here(/94897)(94898)content is here(/94898)(94899)content is here(/94899)";
$re = '/\((\d+)\)(.*)\(\/\1\)/';
preg_match_all($re, $str, $matches,PREG_SET_ORDER);
var_dump($matches);
Number will be in $matches[*][1], content in $matches[*][2].
I am having problems with RegEx in PHP and can't seem to find the answer.
I have a string, which is 3 letters, all caps ie COS.
the letters will change but always be 3 chars long and in caps, it will also be in the center of another string, surrounded by commas.
I need a regEx to find 3 caps letter inside a string and cahnge them from COS to 'COS'
(im doing this to amend a sql insert string)
I can't seem to find the regEx unless i use spercifit letter but the letters will change.
I need something along the lines of
[A-z]{3} then replace with '[A-Z]' (I know this isnt anywere near correct, just shorthand)
Anyone any suggestions?
Cheers
EDIT:
Just wanted to add incase anyone comes accross this question at a later date:
the sql insert string (provided from an external source and ftp's to my server daily)
contained the 3 capital string twice, once with commas and once with out
so I had to also remove the double commas added from the first regEx
$sqlString = preg_replace('/([A-Z]{3})/', "'$1'", $isqlString);
$sqlString = preg_replace('/\'\'([A-Z]{3})\'\'/', "'$1'", $sqlStringt);
Thanks everyone
You were actually very close. You could use:
echo preg_replace('/([A-Z]{3})/', "'$1'", 'COS'); //will output 'COS'
For MySQL statements I would advise to use the function mysql_real_escape_string() though.
$string = preg_replace('/([A-Z]{3})/', "'$1'", $string);
http://php.net/manual/en/function.preg-replace.php
Assuming it's like you said, "three capital letters surrounded by commas, e.g.
Foo bar,COS,Foo Bar
You can use look-ahead and look-behinds and find the letters:
(?<=,)([A-Z]{3})(?=,)
Then a simple replace to surround with single quotes will be adequate:
'$1'
All together, Here's it working.
preg_replace('/(^|\b)([A-Z]{3})(\b|$)/', "'${2}'", $string);