Extract text from quotation marks in PHP - php

I have a PHP page which gets text from an outside source wrapped in quotation marks. How do I strip them off?
For example:
input: "This is a text"
output: This is a text
Please answer with full PHP coding rather than just the regex...

This will work quite nicely unless you have strings with multiple quotes like """hello""" as input and you want to preserve all but the outermost "'s:
$output = trim($input, '"');
trim strips all of certain characters from the beginning and end of a string in the charlist that is passed in as a second argument (in this case just "). If you don't pass in a second argument it trims whitespace.
If the situation of multiple leading and ending quotes is an issue you can use:
$output = preg_replace('/^"|"$/', '', $input);
Which replaces only one leading or trailing quote with the empty string, such that:
""This is a text"" becomes "This is a text"

$output = str_replace('"', '', $input);
Of course, this will remove all quotation marks, even from inside the strings. Is this what you want? How many strings like this are there?

The question was on how to do it with a regex (maybe for curiosity/learning purposes).
This is how you would do that in php:
$result = preg_replace('/(")(.*?)(")/i', '$2', $subject);
Hope this helps,
Buckley

Related

Trim whitespace ASCII character "194" from string

Recently ran into a very odd issue where my database contains strings with what appear to be normal whitespace characters but are in fact something else.
For instance, applying trim() to the string:
"TEST "
is getting me:
"TEST "
as a result. So I copy and paste the last character in the string and:
echo ord(' ');
194
194? According to ASCII tables that should be ┬. So I'm just confused at this point. Why does this character appear to be whitespace and how can I trim() characters like this when trim() fails?
It's more likely to be a two-byte 194 160 sequence, which is the UTF-8 encoding of a NO-BREAK SPACE codepoint (the equivalent of the entity in HTML).
It's really not a space, even though it looks like one. (You'll see it won't word-wrap, for instance.) A regular expression match for \s would match it, but a plain comparison with a space won't; nor will trim() remove it.
To replace NO-BREAK spaces with a normal space, you should be able to do something like:
$string = str_replace("\u{c2a0}", " ", $string);
or
$string = str_replace("\u{c2a0}", "", $string);
to remove them
You can try with :
PHP trim
$foo = "TEST ";
$foo = trim($foo);
PHP str_replace
$foo = "TEST ";
$foo = str_replace(chr(194), '', $foo);
IMPORTANT: You can try with chr(194).chr(160) or '\u00A0'
PHP preg_replace
$foo = "TEST ";
$foo = preg_replace('#(^\s+|\s+$)#', '', $foo);
OR (i'm not sure if it will work well)
$foo = "TEST ";
$foo = preg_replace('#[\xC2\xA0]#', '', $foo);
Had the same issue. Solved it with
trim($str, ' ' . chr(194) . chr(160))
You probably got the original data from Excel/CSV.. I'm importing from such format to my mysql db and it took me hours to figure out why it came padded and trim didn't appear to work (had to check every character in each CSV column string) but in fact it seems Excel adds chr(32) + chr (194) + chr(160) to "fill" the column, which at first sight, looks like all spaces at the end. This is what worked for me to have a pretty, perfect string to load into the db:
// convert to utf8
$value = iconv("ISO-8859-15", "UTF-8",$data[$c]);
// excel adds 194+160 to fill up!
$value = rtrim($value,chr(32).chr(194).chr(160));
// sanitize (escape etc)
$value = $dbc->sanitize($value);
php -r 'print_r(json_encode(" "));'
"\u00a0"
$string = str_replace("\u{00a0}", "", $string); //not \u{c2a0}
I needed to trim my string in PHP and was getting the same results.
After discovering the reason via Mark Bakers answer, I used the following in place of trim:
// $str = trim($str); // won't strip UTF-8 encoded nonbreaking spaces
$str = preg_replace('/^(\\s|\\xC2\\xA0)+|(\\s|\\xC2\\xA0)+$/', '', $str);
Thought I should contribute an answer of my own since it has now become clear to me what was happening. The problem originates dealing with html which contains a non-breaking space entity, . Once you load the content in php's DOMDocument(), all entities are converted to their decoded values and upon parsing the it you end up with a non-breaking space character. In any event, even in a different scenario, the following method is another option for converting these to regular spaces:
$foo = str_replace(' ',' ',htmlentities($foo));
This works by first converting the non-breaking space into it's html entity, and then to a regular space. The contents of $foo can now be easily trimmed as normal.

str_replace on string only if it INCLUDES a whitespace

I have a substitution code that replaces all instances of XD with an XD smiley face... thing is, should a link include the string 'XD', it then breaks the link.
I want it to only replace the XD, if it is followed by a whitespace, as in 'XD ', except I can't seem to get it to work (tried &nbsp, /\s/ and as in 'XD&nbsp')
Chances are I'm getting something really obvious wrong, but I can't find any help (all of it seems to be about removing whitespace, not requiring it), so I'm hoping someone can help me.
Here's the code for reference:
function BB_CODE($content) {
$content = str_replace("XD", "<img src=\"images/smilies/icon_xd.gif\" alt=\"XD\">", $content);
}
The content is user input. Thanks for any help!
You should surround "XD" with %
$content= str_replace("%XD%", "<img src=\"images/smilies/icon_xd.gif\" alt=\"XD\">", $content);
EDIT :
Or using preg_replace
preg_replace("/XD/", "<img src=\"images/smilies/icon_xd.gif\" alt=\"XD\">", $content);
So you want to replace XD only if it's alone?
preg_replace('/\bXD\b/', '(ಠ‿ಠ)', "Then I was like XD")
Use \b to watch for word boundaries instead of \s. This means it works at the beginning and the end of string too, like in my example.
With preg_replace() there are two common gotchas:
The separator char, I used / here by convention but in my own code I prefer %. You could write the regex as '%\bXD\b%' with the same meaning.
Escaping the backslashes, I used a single quoted string so I don't have to escape the backslash in \b. If you use double qoutes, you have to escape it, like so: "/\\bXD\\b/"

Is there a better way to strip/form this string in PHP?

I am currently using what appears to be a horribly complex and unnecessary solution to form a required string.
The string could have any punctuation and will include slashes.
As an example, this string:
Test Ripple, it\'s a comic book one!
Using my current method:
str_replace(" ", "-", trim(preg_replace('/[^a-z0-9]+/i', ' ', str_replace("'", "", stripslashes($string)))))
Returns the correct result:
Test-Ripple-its-a-comic-book-one
Here is a breakdown of what my current (poor) solution is doing in order to achieve the desired output:-
Strip all slashes from the string
remove any apostrophes with str_replace
remove any remaining punctuation using preg_replace and replace it with whitespace
Trim off any extra whitespace from the beginning/end of string which may have been caused by punctuation.
Replace all whitespace with '-'
But there must be a better and more efficient way. Can anyone help?
Personally it looks fine to me however I would make one small change.
Change
preg_replace("/[^a-z0-9]+/i"
to the following
preg_replace("/[^a-zA-Z0-9\s]/"

Regex extra spaces in string not in double or single quotes - PHP

I would like to replace extra spaces (instances of consecutive whitespace characters) with one space, as long as those extra spaces are not in double or single quotes (or any other enclosures I may want to include).
I saw some similar questions, but I could not find a direct response to my needs above. Thank you!
Hope you're still looking, or come back to check! This seems to work for me:
'/\s+((["\']).*?(?=\2)\2)|\s\s+/'
...and replace with $1
EDIT
Also, if you need to allow for escaped quotes like \" or \', you could use this expression:
'/\s+((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/'
It gets a bit stickier if you want to add support for "balanced" quotes like brackets (e.g. () or {})
END EDIT
Let me know if you find problems or would like some explanation!
HOPEFULLY FINAL EDIT AND WARNINGS
Potential problem: If a quoted string starts at the beginning of the string variable (or file), it will either not count as a quoted string (and have any whitespace reduced) or it will throw off the whole thing, making anything NOT in quotes get treated as though it was in quotes and vice versa -
A potential change that might remedy this is to use the following match expression
/(?:^|\s+)((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/
this replaces \s+ with (?:^|\s+) at the beginning of the expression
this will add a space at the beginning of the variable if the string starts with a quote - just trim() or remove that whitespace to continue
I seem to have used the "line by line" approach (like sed, if I'm not mistaken) to reach my original results - if you use the "whole file" or "whole string" setting or approach, carriage-return-line-feed seems to count as two whitespace characters (can't imagine why...), thus turning any newlines into single spaces (unless they are inside quotes and "dot-matches-newline" is used, of course)
this could be resolved by replacing the . and \s shorthand character classes with the specific characters you want to match, like the following:
/(?:^|[ \t]+)((["\'])(\\\\\2|(?!\2)[\s\S])*?(?=\2)\2)|[ \t]{2,}/
this does not require the dot-matches-newline switch and only replaces multiple spaces or tabs - not newlines - with a single space (and of course, only if they are not quoted)
EXAMPLE
This link shows an example of the first expression and last expression in use on sample text on http://codepad.viper-7.com
You could do it in several steps. Consider the following example:
$str = 'This is a string with "Bunch of extra spaces". Leave them "untouched !".';
$id = 0;
$buffer = array();
$str = preg_replace_callback('|".*?"|', function($m) use (&$id, &$buffer) {
$buffer[] = $m[0];
return '__' . $id++;
}, $str);
$str = preg_replace('|\s+|', ' ', $str);
$str = preg_replace_callback('|__(\d+)|', function($m) use ($buffer) {
return $buffer[$m[1]];
}, $str);
echo $str;
This will output the string:
This is a string with "Bunch of extra spaces". Leave them "untouched !".
Although this is is not the prettiest solution.

PHP trim problem

I asked earlier how can I get rid of extra hyphens and whitespace added at the end and beginning of user submitted text for example, -ruby-on-rails- should be ruby-on-rails you guys suggested trim() which worked fine by itself but when I added it to my code it did not work at all it actually did some funky things to my code.
I tried placing the trim() code every where in my code but nothing worked can someone help me to get rid of extra hyphens and whitespace added at the end and beginning of user submitted text?
Here is my PHP code.
$tags = preg_split('/,/', strip_tags($_POST['tag']), -1, PREG_SPLIT_NO_EMPTY);
$tags = str_replace(' ', '-', $tags);
Update the trim statement to the following in order to update each item in the array:
foreach($tags as $key=>$value) {
$tags[$key] = trim($value, '-');
}
That should allow you to trim each value based on a string being expected.
If you have a string you can do this to strip hyphens from the beginning and end:
$tag = trim($tag, '-');
Your problem is that preg_split returns an array, but trim takes a string. You need to do the above for every string in the array.
Regarding trimming whitespace: if you are first converting all whitespace to hyphens then it should not be necessary to trim whitespace afterwards - the whitespace will already be gone. But be careful because the terms "whitespace" and "space" have different meanings. Your question seems to muddle these two terms.
Verify that the hyphen character you're attempting to trim is the same hyphen character that is wrapping -ruby-on-rails-. For example, these are all different characters that look similar: -, –, —, ―.
Im new to StackOverflow.com so I hope the function I wrote helps you in some way. You can specify what characters you want it to trim in the second parameter, for your example I've set it to just remove whitespace and 'dashes' by default, i've tested it using 'ruby-on-rails' and a somewhat extreme example of '- -- - - ruby-on-rails - -- - - -' and both produce the result: 'ruby-on-rails'.
The regular expression might be a bit of a q&d way of going about it but I hope it helps you, just reply if you have any problems implementing it or w/e.
function customTrim($s,$c='- ')
{
preg_match('#'.($a='[^'.$c.']').'.{1,}'.$a.'#',$s,$match);
return $match[0];
}

Categories