This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
PHP explode the string, but treat words in quotes as a single word.
i have a quoted string with quoted text. Can anyone give me the regex to split this up.
this has a \\\'quoted sentence\\\' inside
the quotes may also be single quotes. Im using preg_match_all.
right now this
preg_match_all('/\\\\"(?:\\\\.|[^\\\\"])*\\\\"|\S+/', $search_terms, $search_term_set);
Array
(
[0] => Array
(
[0] => this
[1] => has
[2] => a
[3] => \\\"quoted
[4] => sentence\\\"
[5] => inside
)
)
i would like this output
Array
(
[0] => Array
(
[0] => this
[1] => has
[2] => a
[3] => \\\"quoted sentence\\\"
[4] => inside
)
)
This is NOT a duplicate of this question. PHP explode the string, but treat words in quotes as a single word
UPDATE:
Ive removed the mysql_real_escape_string. What regex do i need now Im just using magic quotes.
I'm thinking you might want to use strpos and substrin this case.
This is very sloppy, but hopefully you get the general idea at least.
$string = "This has a 'quoted sentence' in it";
// get the string position of every ' " and space
$n_string = $string; //reset n_string
while ($pos = strpos("'", $n_string)) {
$single_pos_arr[] = $pos;
$n_string = substr($n_string, $pos);
}
$n_string = $string; //reset n_string
while ($pos = strpos('"', $n_string)) {
$double_pos_arr[] = $pos;
$n_string = substr($n_string, $pos);
}
$n_string = $string; //reset n_string
while ($pos = strpos(" ", $n_string)) {
$space_pos_arr[] = $pos;
$n_string = substr($n_string, $pos);
}
Once you have the positions, you can write a simple algorithm to finish the job.
Why are there slashes in your input string?
Use stripslashes to get rid of them.
Then either write your own tokenizer or use this regex:
preg_match_all("/(\"[^\"]+\")|([^\s]+)/", $input, $matches)
Too long for a comment, even though it's actually a comment.
I don't understand how it's not a duplicate, using the principle from that link and replace quotes with triple blackslashed quotes:
$text = "this has a \\\\\'quoted sentence\\\\\' inside and then \\\\\'some more\\\\\' stuff";
print $text; //check input
$pattern = "/\\\{3}'(?:[^\'])*\\\{3}'|\S+/";
preg_match_all($pattern, $text, $matches);
print_r($matches);
and you get what you need. It's pretty much 100% copy of the link you posted with the only change being exactly what the guy suggested to do if you wanted to change the delimiters.
Edit: Here's my output:
Array
(
[0] => Array
(
[0] => this
[1] => has
[2] => a
[3] => \\\'quoted sentence\\\'
[4] => inside
[5] => and
[6] => then
[7] => \\\'some more\\\'
[8] => stuff
)
)
Edit2: Are you checking for single or double quotes after 3 slashes (your input and output array doesn't match if all you're doing is matching) or are you changing single quotes after three slashes in input to triple slash double quotes in output? If all you're doing is matching just change the two single quotes in patter to escaped double quotes or wrap pattern in single quotes so you don't have to escape double quotes.
Related
Is there any way to achieve the following? I need to take this $query and split it into its various elements (the reason is because I am having to reprocess an insert query). As you can see this will work for regular string blocks or numbers, but not where a number, occurs in the string. Is there a way to say |\d but not where that \d occurs within a ' quoted string '?
$query = "('this is\'nt very, funny (I dont think)','is it',12345,'nope','like with 2,4,6')";
$matches = preg_split("#',|\d,#",substr($query,1,-1));
echo $query;
print'<pre>[';print_r($matches);print']</pre>';
So just to be clear about expected results:
0:'this is\'nt very, funny (I dont think)'
1:'it is'
2:12345
3:'nope'
4:'like with 2,4,6'.
** Additionally I don't mind if each string is not quoted - I can requote them myself.
Could (*SKIP)(*F) parts that are inside single quotes and match , outside:
'(?:\\'|[^'])*'(*SKIP)(*F)|,
(?:\\'|[^']) Inside the single quotes matches escaped \' or a character that is not a single quote.
See Test at regex101.com
$query = "('this is\'nt very, funny (I dont think)','is it',12345,'nope','like with 2,4,6')";
$matches = preg_split("~'(?:\\\\'|[^'])*'(*SKIP)(*F)|,~", substr($query,1,-1));
print_r($matches);
outputs to (test at eval.in)
Array
(
[0] => 'this is\'nt very, funny (I dont think)'
[1] => 'is it'
[2] => 12345
[3] => 'nope'
[4] => 'like with 2,4,6'
)
Not absolutely sure, if that is what you mean :)
('(?:(?!(?<!\\)').)*')|(\d+)
Try this.Grab the captures.Each string is quoted as well.See demo.
http://regex101.com/r/dK1xR4/3
You could try matching through preg_match_all instead of splitting.
<?php
$data = "('this is\'nt very, funny (I dont think)','is it',12345,'nope','like with 2,4,6')";
$regex = "~'(?:\\\\'|[^'])+'|(?<=,|\()[^',)]*(?=,|\))~";
preg_match_all($regex, $data, $matches);
print_r($matches[0]);
?>
Output:
Array
(
[0] => 'this is\'nt very, funny (I dont think)'
[1] => 'is it'
[2] => 12345
[3] => 'nope'
[4] => 'like with 2,4,6'
)
If you don't mind using preg_match, then the solution could look like this. This regex uses lookbehind with negative assertions (?<!\\\\), it will match strings inside quotes that is not preceded by slash, and the alternation with the vertical bar ensures that numbers that are part of larger match will be ignored.
$query = "('this is\'nt very, funny (I dont think)','is it',12345,'nope','like with 2,4,6',6789)";
preg_match_all( "/(?<!\\\\)\'.+?(?<!\\\\)\'|\d+/", substr( $query, 1, -1 ), $matches );
print_r( $matches );
/* output:
Array (
[0] => Array
(
[0] => 'this is\'nt very, funny (I dont think)'
[1] => 'is it'
[2] => 12345
[3] => 'nope'
[4] => 'like with 2,4,6'
[5] => 6789
)
)
*/
,(?=(?:[^']*'[^']*')*[^']*$)
Try this.This will split according to what you want.Replace by \n.See demo.
http://regex101.com/r/dK1xR4/4
So I'm kind of stuck on this - I'm looking to replace text in an array (easily done via str_replace), but I would also like to append text onto the end of that specific array. For example, my original array is:
Array
(
[1] => DTSTART;VALUE=DATE:20130712
[2] => DTEND;VALUE=DATE:20130713
[3] => SUMMARY:Vern
[4] => UID:1fb5aa60-ff89-429e-80fd-ad157dc777b8
[5] => LAST-MODIFIED:20130711T010042Z
[6] => SEQUENCE:1374767972
)
I would like to search that array for ";VALUE=DATE" and replace it with nothing (""), but would also like to insert a text string 7 characters after each replace ("T000000"). So my resulting array would be:
Array
(
[1] => DTSTART:20130712T000000
[2] => DTEND:20130713T000000
[3] => SUMMARY:Vern
[4] => UID:1fb5aa60-ff89-429e-80fd-ad157dc777b8
[5] => LAST-MODIFIED:20130711T010042Z
[6] => SEQUENCE:1374767972
)
Is something like this possible using combinations of str_replace, substr_replace, etc? I'm fairly new to PHP and would love if someone could point me in the right direction! Thanks much
You can use preg_replace as an one-stop shop for this type of manipulation:
$array = preg_replace('/(.*);VALUE=DATE(.*)/', '$1$2T000000', $array);
The regular expression matches any string that contains ;VALUE=DATE and captures whatever precedes and follows it into capturing groups (referred to as $1 and $2 in the replacement pattern). It then replaces that string with $1 concatenated to $2 (effectively removing the search target) and appends "T000000" to the result.
The naive approach would be to loop over each element and check for ;VALUE=DATE. If it exists, remove it and append T000000.
foreach ($array as $key => $value) {
if (strpos($value, ';VALUE=DATE') !== false) {
$array[$key] = str_replace(";VALUE=DATE", "", $value) . "T000000";
}
}
You are correct str_replace() is the function that you are looking for. In addition you can use the concatenation operator . to append your string to the end of the new string. Is this what you are looking for?
$array[1] = str_replace(";VALUE=DATE", "", $array[1])."T000000";
$array[2] = str_replace(";VALUE=DATE", "", $array[2])."T000000";
for($i=0;$i<count($array);$i++){
if(strpos($array[$i], ";VALUE=DATE")){//look for the text into the string
//Text found, let's replace and append
$array[$i]=str_replace(";VALUE=DATE","",$array[$i]);
$array[$i].="T000000";
}
else{
//text not found in that position, will not replace
//Do something
}
}
If you want just to replace, just do it
$array=str_replace($array,";VALUE=DATE","");
And will replace all the text in all the array's positions...
I'd like to parse a string like the following :
'serviceHits."test_server"."http_test.org" 31987'
into an array like :
[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987
Basically I want to split in dots and spaces, treating strings within quotes as a single value.
The format of this string is not fixed, this is just one example. It might contain different numbers of elements with quoted and numerical elements in different places.
Other strings might look like :
test.2 3 which should parse to [test|2|3]
test."342".cake.2 "cheese" which should parse to [test|342|cake|2|cheese]
test."red feet".3."green" 4 which should parse to [test|red feet|3|green|4]
And sometimes the oid string may contain a quote mark, which should be included if possible, but it's the least important part of the parser:
test."a \"b\" c" "cheese face" which should parse to [test|a "b" c|cheese face]
I'm trying to parse SNMP OID strings from agent written by people with quite varying ideas on what an OID should look like, in a generic manner.
Parsing off the oid string (the bit separated with dots) return value (the last value) into separate named arrays would be nice. Simply splitting on space before parsing the string wouldn't work, as both the OID and the value can contain spaces.
Thanks!
I agree this can be hard to find one regexp to resolve this issue.
Here's a complete solution :
$results = array();
$str = 'serviceHits."test_\"server"."http_test.org" 31987';
// Encode \" to something else temporary
$str_encoded_quotes = strtr($str,array('\\"'=>'####'));
// Split by strings between double-quotes
$str_arr = preg_split('/("[^"]*")/',$str_encoded_quotes,-1,PREG_SPLIT_DELIM_CAPTURE);
foreach ($str_arr as $substr) {
// If value is a dot or a space, do nothing
if (!preg_match('/^[\s\.]$/',$substr)) {
// If value is between double-quotes, it's a string
// Return as is
if (preg_match('/^"(.*)"$/',$substr)) {
$substr = preg_replace('/^"(.*)"$/','\1',$substr); // Remove double-quotes around
$results[] = strtr($substr,array('####'=>'"')); // Get escaped double-quotes back inside the string
// Else, it must be splitted
} else {
// Split by dot or space
$substr_arr = preg_split('/[\.\s]/',$substr,-1,PREG_SPLIT_NO_EMPTY);
foreach ($substr_arr as $subsubstr)
$results[] = strtr($subsubstr,array('####'=>'"')); // Get escaped double-quotes back inside string
}
}
// Else, it's an empty substring
}
var_dump($results);
Tested with all of your new string examples.
First attempt (OLD)
Using preg_split :
$str = 'serviceHits."test_server"."http_test.org" 31987';
// -1 : no limit
// PREG_SPLIT_NO_EMPTY : do not return empty results
preg_split('/[\.\s]?"[\.\s]?/',$str,-1,PREG_SPLIT_NO_EMPTY);
The easiest way is probably to replace dots and spaces inside strings with placeholders, split, then remove the placeholders. Something like this:
$in = 'serviceHits."test_server"."http_test.org" 31987';
$a = preg_replace_callback('!"([^"]*)"!', 'quote', $in);
$b = preg_split('![. ]!', $a);
foreach ($b as $k => $v) $b[$k] = unquote($v);
print_r($b);
# the functions that do the (un)quoting
function quote($m){
return str_replace(array('.',' '),
array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'), $m[1]);
}
function unquote($str){
return str_replace(array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'),
array('.',' '), $str);
}
Here is a solution that works with all of your test samples (plus one of my own) and allows you to escape quotes, dots, and spaces.
Due to the requirement of handling escape codes, a split is not really possible.
Although one can imagine a regex that matches the entire string with '()' to mark the separate elements, I was unable to get it working using preg_match or preg_match_all.
Instead I parsed the string incrementally, pulling off one element at a time. I then use stripslashes to unescape quotes, spaces, and dots.
<?php
$strings = array
(
'serviceHits."test_server"."http_test.org" 31987',
'test.2 3',
'test."342".cake.2 "cheese"',
'test."red feet".3."green" 4',
'test."a \\"b\\" c" "cheese face"',
'test\\.one."test\\"two".test\\ three',
);
foreach ($strings as $string)
{
print"'{$string}' => " . print_r(parse_oid($string), true) . "\n";
}
/**
* parse_oid parses and OID and returns an array of the parsed elements.
* This is an all-or-none function, and will return NULL if it cannot completely
* parse the string.
* #param string $string The OID to parse.
* #return array|NULL A list of OID elements, or null if error parsing.
*/
function parse_oid($string)
{
$result = array();
while (true)
{
$matches = array();
$match_count = preg_match('/^(?:((?:[^\\\\\\. "]|(?:\\\\.))+)|(?:"((?:[^\\\\"]|(?:\\\\.))+)"))((?:[\\. ])|$)/', $string, $matches);
if (null !== $match_count && $match_count > 0)
{
// [1] = unquoted, [2] = quoted
$value = strlen($matches[1]) > 0 ? $matches[1] : $matches[2];
$result[] = stripslashes($value);
// Are we expecting any more parts?
if (strlen($matches[3]) > 0)
{
// I do this (vs keeping track of offset) to use ^ in regex
$string = substr($string, strlen($matches[0]));
}
else
{
return $result;
}
}
else
{
// All or nothing
return null;
}
} // while
}
This generates the following output:
'serviceHits."test_server"."http_test.org" 31987' => Array
(
[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987
)
'test.2 3' => Array
(
[0] => test
[1] => 2
[2] => 3
)
'test."342".cake.2 "cheese"' => Array
(
[0] => test
[1] => 342
[2] => cake
[3] => 2
[4] => cheese
)
'test."red feet".3."green" 4' => Array
(
[0] => test
[1] => red feet
[2] => 3
[3] => green
[4] => 4
)
'test."a \"b\" c" "cheese face"' => Array
(
[0] => test
[1] => a "b" c
[2] => cheese face
)
'test\.one."test\"two".test\ three' => Array
(
[0] => test.one
[1] => test"two
[2] => test three
)
i have another php preg_split question which is very similar to my last question, although i fear the solution will be quite a bit more complicated. as before, i'm trying to use php to split a string into array components using either " or ' as the delimiter. however in addition to this i would like to ignore escaped single quotations within the string (escaped double quotations within a string will not happen so there is no need to worry about that). all of the examples from my last question remain valid, but in addition the following two desired results should also be obtained:
$pattern = "?????";
$str = "the 'cat\'s dad sat on' the mat then \"fell 'sideways' off\" the mat";
$res = preg_split($pattern, $str, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($res);
/*output:
Array
(
[0] => the
[1] => 'cat\'s dad sat on'
[2] => the mat then
[3] => "fell 'sideways' off"
[4] => the mat
)*/
$str = "the \"cat\'s dad\" sat on 'the \"cat\'s\" own' mat";
$res = preg_split($pattern, $str, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($res);
/*output:
Array
(
[0] => the
[1] => "cat\'s dad"
[2] => sat on
[3] => 'the "cat\'s" own'
[4] => mat
)*/
#mcrumley's answer to my previous question worked well if there were no escaped quotations:
$pattern = "/('[^']*'|\"[^\"]*\")/U";
however as soon as an escaped single quotation is given the regex uses it as the end of the match, which is not what i want.
i have tried something like this:
$pattern = "/('(?<=(?!\\').*)'|\"(?<=(?!\\').*)\")/";
but its not working. unfortunately my knowledge of lookarounds is not good enough for this.
after some reading and fiddling...
this seems closer:
$pattern = "/('(?:(?!\\').*)')|(\"(?:(?!\\'|').*)\")/";
but the level of greedyness is wrong and does not produce the above outputs.
Try this:
$pattern = "/(?<!\\\\)('(?:\\\\'|[^'])*'|\"(?:\\\\\"|[^\"])*\")/";
^^^^^^^^^ ^^^^^^^^^ ^ ^^^^^^^^^^ ^
Demo at http://rubular.com/r/Eps2mx8KCw.
You can also collapse that into a unified expression using back-references:
$pattern = "/(?<!\\\\)((['\"])(?:\\\\\\2|(?!\\2).)*\\2)/";
Demo at http://rubular.com/r/NLZKyr9xLk.
These don't work though if you also want escaped backslashes to be recognized in your text, but I doubt that's a scenario you need to account for.
Right now i'm trying to get this:
Array
(
[0] => hello
[1] =>
[2] => goodbye
)
Where index 1 is the empty string.
$toBeSplit= 'hello,,goodbye';
$textSplitted = preg_split('/[,]+/', $toBeSplit, -1);
$textSplitted looks like this:
Array
(
[0] => hello
[1] => goodbye
)
I'm using PHP 5.3.2
[,]+ means one or more comma characters while as much as possible is matched. Use just /,/ and it works:
$textSplitted = preg_split('/,/', $toBeSplit, -1);
But you don’t even need regular expression:
$textSplitted = explode(',', $toBeSplit);
How about this:
$textSplitted = preg_split('/,/', $toBeSplit, -1);
Your split regex was grabbing all the commas, not just one.
Your pattern splits the text using a sequence of commas as separator (its syntax also isn't perfect, as you're using a character class for no reason), so two (or two hundred) commas count just as one.
Anyway, since your just using a literal character as separator, use explode():
$str = 'hello,,goodbye';
print_r(explode(',', $str));
output:
Array
(
[0] => hello
[1] =>
[2] => goodbye
)