Remove quotes from start and end of string in PHP - php

I have strings like these:
"my value1" => my value1
"my Value2" => my Value2
myvalue3 => myvalue3
I need to get rid of " (double-quotes) in end and start, if these exist, but if there is this kind of character inside String then it should be left there. Example:
"my " value1" => my " value1
How can I do this in PHP - is there function for this or do I have to code it myself?

The literal answer would be
trim($string,'"'); // double quotes
trim($string,'\'"'); // any combination of ' and "
It will remove all leading and trailing quotes from a string.
If you need to remove strictly the first and the last quote in case they exist, then it could be a regular expression like this
preg_replace('~^"?(.*?)"?$~', '$1', $string); // double quotes
preg_replace('~^[\'"]?(.*?)[\'"]?$~', '$1', $string); // either ' or " whichever is found
If you need to remove only in case the leading and trailing quote are strictly paired, then use the function from Steve Chambers' answer
However, if your goal is to read a value from a CSV file, fgetcsv is the only correct option. It will take care of all the edge cases, stripping the value enclosures as well.

I had a similar need and wrote a function that will remove leading and trailing single or double quotes from a string:
/**
* Remove the first and last quote from a quoted string of text
*
* #param mixed $text
*/
function stripQuotes($text) {
return preg_replace('/^(\'(.*)\'|"(.*)")$/', '$2$3', $text);
}
This will produce the outputs listed below:
Input text Output text
--------------------------------
No quotes => No quotes
"Double quoted" => Double quoted
'Single quoted' => Single quoted
"One of each' => "One of each'
"Multi""quotes" => Multi""quotes
'"'"#";'"*&^*'' => "'"#";'"*&^*'
Regex demo (showing what is being matched/captured): https://regex101.com/r/3otW7H/1

trim will remove all instances of the char from the start and end if it matches the pattern you provide, so:
$myValue => '"Hi"""""';
$myValue=trim($myValue, '"');
Will become:
$myValue => 'Hi'.
Here's a way to only remove the first and last char if they match:
$output=stripslashes(trim($myValue));
// if the first char is a " then remove it
if(strpos($output,'"')===0)$output=substr($output,1,(strlen($output)-1));
// if the last char is a " then remove it
if(strripos($output,'"')===(strlen($output)-1))$output=substr($output,0,-1);

As much as this thread should have been killed long ago, I couldn't help but respond with what I would call the simplest answer of all. I noticed this thread re-emerging on the 17th so I don't feel quite as bad about this. :)
Using samples as provided by Steve Chambers;
echo preg_replace('/(^[\"\']|[\"\']$)/', '', $input);
Output below;
Input text Output text
--------------------------------
No quotes => No quotes
"Double quoted" => Double quoted
'Single quoted' => Single quoted
"One of each' => One of each
"Multi""quotes" => Multi""quotes
'"'"#";'"*&^*'' => "'"#";'"*&^*'
This only ever removes the first and last quote, it doesn't repeat to remove extra content and doesn't care about matching ends.

This is an old post, but just to cater for multibyte strings, there are at least two possible routes one can follow. I am assuming that the quote stripping is being done because the quote is being considered like a program / INI variable and thus is EITHER "something" or 'somethingelse' but NOT "mixed quotes'. Also, ANYTHING between the matched quotes is to be retained intact.
Route 1 - using a Regex
function sq_re($i) {
return preg_replace( '#^(\'|")(.*)\1$#isu', '$2', $i );
}
This uses \1 to match the same type quote that matched at the beginning. the u modifier, makes it UTF8 capable (okay, not fully multibyte supporting)
Route 2 - using mb_* functions
function sq($i) {
$f = mb_substr($i, 0, 1);
$l = mb_substr($i, -1);
if (($f == $l) && (($f == '"') || ($f == '\'')) ) $i = mb_substr($i, 1, mb_strlen( $i ) - 2);
return $i;
}

You need to use regular expressions, look at:-
http://php.net/manual/en/function.preg-replace.php
Or you could, in this instance, use substr to check if the first and then the last character of the string is a quote mark, if it is, truncate the string.
http://php.net/manual/en/function.substr.php

How about regex
//$singleQuotedString="'Hello this 'someword' and \"somewrod\" stas's SO";
//$singleQuotedString="Hello this 'someword' and \"somewrod\" stas's SO'";
$singleQuotedString="'Hello this 'someword' and \"somewrod\" stas's SO'";
$quotesFreeString=preg_replace('/^\'?(.*?(?=\'?$))\'?$/','$1' ,$singleQuotedString);
Output
Hello this 'someword' and "somewrod" stas's SO

If you like performance over clarity this is the way:
// Remove double quotes at beginning and/or end of output
$len=strlen($output);
if($output[0]==='"') $iniidx=1; else $iniidx=0;
if($output[$len-1]==='"') $endidx=-1; else $endidx=$len-1;
if($iniidx==1 || $endidx==-1) $output=substr($output,$iniidx,$endidx);
The comment helps with clarity...
brackets in an array-like usage on strings is possible and demands less processing effort than equivalent methods, too bad there isnt a length variable or a last char index

Related

PHP convert double quoted string to single quoted string

I know this question asked here many times.But That solutions are not useful for me. I am facing this problem very badly today.
// Case 1
$str = 'Test \300'; // Single Quoted String
echo json_encode(utf8_encode($str)) // output: Test \\300
// Case 2
$str = "Test \300"; // Double Quoted String
echo json_encode(utf8_encode($str)) // output: Test \u00c0
I want case 2's output and I have single quoted $str variable. This variable is filled from XML string parsing . And that XML string is saved in txt file.
(Here \300 is encoding of À (latin Charactor) character and I can't control it.)
Please Don't give me solution for above static string
Thanks in advance
This'll do:
$string = '\300';
$string = preg_replace_callback('/\\\\\d{1,3}/', function (array $match) {
return pack('C', octdec($match[0]));
}, $string);
It matches any sequence of a backslash followed by up to three numbers and converts that number from an octal number to a binary string. Which has the same result as what "\300" does.
Note that this will not work exactly the same for escaped escapes; i.e. "\\300" will result in a literal \300 while the above code will convert it.
If you want all the possible rules of double quoted strings followed without reimplementing them by hand, your best bet is to simply eval("return \"$string\""), but that has a number of caveats too.
May You are looking for this
$str = 'Test \300'; // Single Quoted String
echo json_encode(stripslashes($str)); // output: Test \\300

Erasing C comments with preg_replace

I need to erase all comments in $string which contains data from some C file.
The thing I need to replace looks like this:
something before that shouldnt be replaced
/*
* some text in between with / or * on many lines
*/
something after that shouldnt be replaced
and the result should look like this:
something before that shouldnt be replaced
something after that shouldnt be replaced
I have tried many regular expressions but neither work the way I need.
Here are some latest ones:
$string = preg_replace("/\/\*(.*?)\*\//u", "", $string);
and
$string = preg_replace("/\/\*[^\*\/]*\*\//u", "", $string);
Note: the text is in UTF-8, the string can contain multibyte characters.
You would also want to add the s modifier to tell the regex that .* should include newlines. I always think of s to mean "treat the input text as a single line"
So something like this should work:
$string = preg_replace("/\\/\\*(.*?)\\*\\//us", "", $string);
Example: http://codepad.viper-7.com/XVo9Tp
Edit: Added extra escape slashes to the regex as Brandin suggested because he is right.
I don't think regexp fit good here. What about wrote a very small parse to remove this? I don't do PHP coding for a long time. So, I will try to just give you the idea (simple alogorithm) I haven't tested this, it's just to you get the idea, as I said:
buf = new String() // hold the source code without comments
pos = 0
while(string[pos] != EOF) {
if(string[pos] == '/') {
pos++;
while(string[pos] != EOF)
{
if(string[pos] == '*' && string[pos + 1] == '/') {
pos++;
break;
}
pos++;
}
}
buf[buf_index++] = string[pos++];
}
where:
string is the C source code
buf a dynamic allocated string which expands as needed
It is very hard to do this perfectly without ending up writing a full C parser.
Consider the following, for example:
// Not using /*-style comment here.
// This line has an odd number of " characters.
while (1) {
printf("Wheee!
(*\/*)
\\// - I'm an ant!
");
/* This is a multiline comment with a // in, and
// an odd number of " characters. */
}
So, from the above, we can see that our problems include:
multiline quote sequences should be ignored within doublequotes. Unless those doublequotes are part of a comment.
single-line comment sequences can be contained in double-quoted strings, and in multiline strings.
Here's one possibility to address some of those issues, but far from perfect.
// Remove "-strings, //-comments and /*block-comments*/, then restore "-strings.
// Based on regex by mauke of Efnet's #regex.
$file = preg_replace('{("[^"]*")|//[^\n]*|(/\*.*?\*/)}s', '\1', $file);
try this:
$string = preg_replace("#\/\*\n?(.*)\*\/\n?#ms", "", $string);
Use # as regexp boundaries; change that u modifier with the right ones: m (PCRE_MULTILINE) and s (PCRE_DOTALL).
Reference: http://php.net/manual/en/reference.pcre.pattern.modifiers.php
It is important to note that my regexp does not find more than one "comment block"... Use of "dot match all" is generally not a good idea.

Get String of first Argument of an function-call

I want to search withing PHP-Files for a special function call. The reason is, that I want to generate .MO-Files for the GetText-Extension. So I first need to create a .PO-Files, which contains all the needed text-strings.
I already find a lot of texts, but there are some problems.
Here is my Regex to find the first Argument of an functioncall:
/\_\([\'|\"]{1}(.+?[^\\\])[\'|\"]{1}[,]{0,1}.*?\)+/si
I need to find function-calls with the following patterns:
_("text");
_("text %s", 3);
_('text');
The Text could contain escaped Quotes. My Problem is acuallty, that I need to know, if there was an apostrophe or an normal quote used for the call.
If I have the call
_('"text"');
then i get the Problem, that I get the text
"text
without the ending quote.
Does anybody of you have an Idea, how I can get my Regex to work?
I would use PHP's tokenizer for this kind of stuff, not regular expressions:
$funcName = '_';
$tokens = token_get_all(file_get_contents('path/to/your/script.php'));
$strings = array();
foreach($tokens as $index => $token){
if(!is_array($token))
continue;
if($token[0] === T_CONSTANT_ENCAPSED_STRING){
if(!isset($tokens[$index - 2]) || ($tokens[$index - 1] !== "("))
continue;
list($id, $text, $line) = $tokens[$index - 2];
// this is your string (substr drops quotes around it)
if(($id === T_STRING) && ($text === $funcName))
$strings[] = substr($token[1], 1, -1);
}
}
var_dump($strings);
Raw regex:
_\((?|'((?:[^'\\]|\\.)*)'|"((?:[^"\\]|\\.)*)")
Delimited regex:
~_\((?|'((?:[^'\\]|\\.)*)'|"((?:[^"\\]|\\.)*)")~
The result is in capturing group 1. I used the branch reset pattern (?|pattern) so that the capturing group number is reset for each alternating branch separated by |.
Inside of the branch reset (?|'((?:[^'\\]|\\.)*)'|"((?:[^"\\]|\\.)*)") are 2 pattern:
'((?:[^'\\]|\\.)*)': Match and capture content inside single quoted string, which consists of either non-quote-non-backslash or escaped sequence. Actually, I am a bit careless here, since (raw) new line character is considered part of the string. I don't think the specification would allow this, but if the input contains valid code, then there should be no problem.
"((?:[^"\\]|\\.)*)": Same as above, but for double quoted string.
Note that I don't consume the rest of the arguments to the function.

How to break queries using regex in php

Suppose I have the following string:
insert into table values ('text1;'); insert into table values ('text2')
How do I break those queries (get each individual query) using regular expressions?
I've found a very similar problem: Use regex to find specific string not in html tag ...but it uses a solution that is specific to .NET: behind lookup (in php it complains that is not fixed length).
I would be very grateful if someone could give me some hints on how to deal with this problem.
The trick is to count how many unescaped quote characters you've passed. Assuming that the SQL is syntactically correct, semicolons after an even number of unescaped quote characters will be the ones you want, and semicolons after an odd number of unescaped quote characters will be part of a string literal. (Remember, string literals can contain properly escaped quote characters.)
If you want 100% reliability, you'll need a real SQL parser, like this. (I just Googled "SQL parser in PHP". I don't know if it works or not.)
EDIT:
I don't think it's possible to find pairs of unescaped quote characters using nothing but regex. Maybe a regex guru will prove me wrong, but it just seems too damn difficult to distinguish between escaped and unescaped quote characters in so many possible combinations. I tried look-behind assertions and backrefs with no success.
The following is not a pure-regex solution, but I think it works:
preg_match_all("/(?:([^']*'){2})*[^']*;/U", str_replace("\\'", "\0\1\2", $input), $matches);
$output = array_map(function($str){ return str_replace("\0\1\2", "\\'", $str); }, $matches[0]);
Basically, we temporarily replace escaped quote characters with a string of bytes that is extremely unlikely to occur, in this case \0\1\2. After that, all the quote characters that remain are the unescaped ones. The regex picks out semicolons preceded by an even number of quote characters. Then we restore the escaped quote characters. (I used a closure there, so it's PHP 5.3 only.)
If you don't need to deal with quote characters inside string literals, yes, you can easily do it with pure regex.
Assuming proper SQL syntax it would probably be best to split on the semicolon.
The following regexp will work but only if all quotes come in pairs.
/.+?\'.+?\'.*?;|.+?;/
To avoid escaped single quotes:
/.+?[^\\\\]\'.+?[^\\\\]\'.*?;|.+?;/
To handle Multiple pairs of single quotes.
/.+?(?:[^\\]\'.+?[^\\]\')+.*?;|.+?;/
Tested against the following data set:
insert into table values ('text1;\' ','2'); insert into table values ('text2'); insert into test3 value ('cookie\'','fly');
Returns:
insert into table values ('text1;\' ','2');
insert into table values ('text2');
insert into test3 value ('cookie\'','fly');
I have to admit this is a pretty dirty regexp. It would not handle any sort of SQL syntax errors at all. I enjoyed the challenge of coming up with a pure regex though.
How you want to break?
You can use explode( ' ', $query ) to transform into an array.
Or if you want to get text1 and text2 values with regexp you can use preg_match( '/(\'([\w]+)\')/', $query, $matches ) where $matches[1] is your value.
preg_match_all( '/([\w ]+([\w \';]+))/', $queries, $matches ) will give to you all matches with this pattern of query.
Regex's aren't always good at this type of thing. The following function should work though:
function splitQuery($query) {
$open = false;
$buffer = null;
$parts = array();
for($i = 0, $l = strlen($query); $i < $l; $i++) {
if ($query[$i] == ';' && !$open) {
$parts[] = trim($buffer);
$buffer = null;
continue;
}
if ($query[$i] == "'") {
$open = ($open) ? false: true;
}
$buffer .= $query[$i];
}
if ($buffer) $parts[] = trim($buffer);
return $parts;
}
Usage:
$str = "insert into table values ('text1;'); insert into table values ('text2')";
$str = splitQuery($str);
print_r($str);
Outputs:
Array
(
[0] => insert into table values ('text1;')
[1] => insert into table values ('text2')
)

How to write regex to return only certain parts of this string?

So I'm working on a project that will allow users to enter poker hand histories from sites like PokerStars and then display the hand to them.
It seems that regex would be a great tool for this, however I rank my regex knowledge at "slim to none".
So I'm using PHP and looping through this block of text line by line and on lines like this:
Seat 1: fabulous29 (835 in chips)
Seat 2: Nioreh_21 (6465 in chips)
Seat 3: Big Loads (3465 in chips)
Seat 4: Sauchie (2060 in chips)
I want to extract seat number, name, & chip count so the format is
Seat [number]: [letters&numbers&characters] ([number] in chips)
I have NO IDEA where to start or what commands I should even be using to optimize this.
Any advice is greatly appreciated - even if it is just a link to a tutorial on PHP regex or the name of the command(s) I should be using.
I'm not entirely sure what exactly to use for that without trying it, but a great tool I use all the time to validate my RegEx is RegExr which gives a great flash interface for trying out your regex, including real time matching and a library of predefined snippets to use. Definitely a great time saver :)
Something like this might do the trick:
/Seat (\d+): ([^\(]+) \((\d+)in chips\)/
And some basic explanation on how Regex works:
\d = digit.
\<character> = escapes character, if not part of any character class or subexpression. for example:
\t
would render a tab, while \\t would render "\t" (since the backslash is escaped).
+ = one or more of the preceding element.
* = zero or more of the preceding element.
[ ] = bracket expression. Matches any of the characters within the bracket. Also works with ranges (ex. A-Z).
[^ ] = Matches any character that is NOT within the bracket.
( ) = Marked subexpression. The data matched within this can be recalled later.
Anyway, I chose to use
([^\(]+)
since the example provides a name containing spaces (Seat 3 in the example). what this does is that it matches any character up to the point that it encounters an opening paranthesis.
This will leave you with a blank space at the end of the subexpression (using the data provided in the example). However, his can easily be stripped away using the trim() command in PHP.
If you do not want to match spaces, only alphanumerical characters, you could so something like this:
([A-Za-z0-9-_]+)
Which would match any letter (within A-Z, both upper- & lower-case), number as well as hyphens and underscores.
Or the same variant, with spaces:
([A-Za-z0-9-_\s]+)
Where "\s" is evaluated into a space.
Hope this helps :)
Look at the PCRE section in the PHP Manual. Also, http://www.regular-expressions.info/ is a great site for learning regex. Disclaimer: Regex is very addictive once you learn it.
I always use the preg_ set of function for REGEX in PHP because the PERL-compatible expressions have much more capability. That extra capability doesn't necessarily come into play here, but they are also supposed to be faster, so why not use them anyway, right?
For an expression, try this:
/Seat (\d+): ([^ ]+) \((\d+)/
You can use preg_match() on each line, storing the results in an array. You can then get at those results and manipulate them as you like.
EDIT:
Btw, you could also run preg_match_all on the entire block of text (instead of looping through line-by-line) and get the results that way, too.
Check out preg_match.
Probably looking for something like...
<?php
$str = 'Seat 1: fabulous29 (835 in chips)';
preg_match('/Seat (?<seatNo>\d+): (?<name>\w+) \((?<chipCnt>\d+) in chips\)/', $str, $matches);
print_r($matches);
?>
*It's been a while since I did php, so this could be a little or a lot off.*
May be it is very late answer, But I am interested in answering
Seat\s(\d):\s([\w\s]+)\s\((\d+).*\)
http://regex101.com/r/cU7yD7/1
Here's what I'm currently using:
preg_match("/(Seat \d+: [A-Za-z0-9 _-]+) \((\d+) in chips\)/",$line)
To process the whole input string at once, use preg_match_all()
preg_match_all('/Seat (\d+): \w+ \((\d+) in chips\)/', $preg_match_all, $matches);
For your input string, var_dump of $matches will look like this:
array
0 =>
array
0 => string 'Seat 1: fabulous29 (835 in chips)' (length=33)
1 => string 'Seat 2: Nioreh_21 (6465 in chips)' (length=33)
2 => string 'Seat 4: Sauchie (2060 in chips)' (length=31)
1 =>
array
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '4' (length=1)
2 =>
array
0 => string '835' (length=3)
1 => string '6465' (length=4)
2 => string '2060' (length=4)
On learning regex: Get Mastering Regular Expressions, 3rd Edition. Nothing else comes close to the this book if you really want to learn regex. Despite being the definitive guide to regex, the book is very beginner friendly.
Try this code. It works for me
Let say that you have below lines of strings
$string1 = "Seat 1: fabulous29 (835 in chips)";
$string2 = "Seat 2: Nioreh_21 (6465 in chips)";
$string3 = "Seat 3: Big Loads (3465 in chips)";
$string4 = "Seat 4: Sauchie (2060 in chips)";
Add to array
$lines = array($string1,$string2,$string3,$string4);
foreach($lines as $line )
{
$seatArray = explode(":", $line);
$seat = explode(" ",$seatArray[0]);
$seatNumber = $seat[1];
$usernameArray = explode("(",$seatArray[1]);
$username = trim($usernameArray[0]);
$chipArray = explode(" ",$usernameArray[1]);
$chipNumber = $chipArray[0];
echo "<br>"."Seat [".$seatNumber."]: [". $username."] ([".$chipNumber."] in chips)";
}
you'll have to split the file by linebreaks,
then loop thru each line and apply the following logic
$seat = 0;
$name = 1;
$chips = 2;
foreach( $string in $file ) {
if (preg_match("Seat ([1-0]): ([A-Za-z_0-9]*) \(([1-0]*) in chips\)", $string, $matches)) {
echo "Seat: " . $matches[$seat] . "<br>";
echo "Name: " . $matches[$name] . "<br>";
echo "Chips: " . $matches[$chips] . "<br>";
}
}
I haven't ran this code, so you may have to fix some errors...
Seat [number]: [letters&numbers&characters] ([number] in chips)
Your Regex should look something like this
Seat (\d+): ([a-zA-Z0-9]+) \((\d+) in chips\)
The brackets will let you capture the seat number, name and number of chips in groups.

Categories