For my project I needed to analyze different sentences and work out which ones were questions by determining if they ended in question marks or not.
So I tried using explode but it didn't support multiple delimiters. I temporarily replaced all punctuation to be chr(1) so that I could explode all sentences no matter what they ended with (., !, ?, etc...).
Then I needed to find the last letter of each sentence however the explode function had removed all of the punctuation, so I needed some way of putting it back in there.
It took me a long time to solve the problem but eventually I cracked it. I am posting my solution here so that others may use it.
$array = preg_split('~([.!?:;])~u',$raw , null, PREG_SPLIT_DELIM_CAPTURE);
Here is my function, multipleExplodeKeepDelimiters. And an example of how it can be used, by exploding a string into different sentences and seeing if the last character is a question mark:
function multipleExplodeKeepDelimiters($delimiters, $string) {
$initialArray = explode(chr(1), str_replace($delimiters, chr(1), $string));
$finalArray = array();
foreach($initialArray as $item) {
if(strlen($item) > 0) array_push($finalArray, $item . $string[strpos($string, $item) + strlen($item)]);
}
return $finalArray;
}
$punctuation = array(".", ";", ":", "?", "!");
$string = "I am not a question. How was your day? Thank you, very nice. Why are you asking?";
$sentences = multipleExplodeKeepDelimiters($punctuation, $string);
foreach($sentences as $question) {
if($question[strlen($question)-1] == "?") {
print("'" . $question . "' is a question<br />");
}
}
Related
I have an array with rule field that has a string like this:
FREQ=MONTHLY;BYDAY=3FR
FREQ=MONTHLY;BYDAY=3SA
FREQ=WEEKLY;UNTIL=20170728T080000Z;BYDAY=MO,TU,WE,TH,FR
FREQ=MONTHLY;UNTIL=20170527T100000Z;BYDAY=4SA
FREQ=WEEKLY;BYDAY=SA
FREQ=WEEKLY;INTERVAL=2;BYDAY=TH
FREQ=WEEKLY;BYDAY=TH
FREQ=WEEKLY;UNTIL=20170610T085959Z;BYDAY=SA
FREQ=MONTHLY;BYDAY=2TH
Each line is a different array, I am giving a few clues to get an idea of what I need.
What I need is to write a regex that would take off all unnecessary values.
So, I don't need FREQ= ; BYDAY= etc. I basically need the values after = but each one I want to store in a different variable.
Taking third one as an example it would be:
$frequency = WEEKLY
$until = 20170728T080000Z
$day = MO, TU, WE, TH, FR
It doesn't have to be necessarily one regex, there can be one regex for each value. So I have one for FREQ:
preg_match("/[^FREQ=][A-Z]+/", $input_line, $output_array);
But I can't do it for the rest unfortunately, how can I solve this?
The only way to go would be PHP array destructuring:
$str = "FREQ=WEEKLY;UNTIL=20170728T080000Z;BYDAY=MO,TU,WE,TH,FR";
preg_match_all('~(\w+)=([^;]+)~', $str, $matches);
[$freq, $until, $byday] = $matches[2]; // As of PHP 7.1 (otherwise use list() function)
echo $freq, " ", $until, " ", $byday;
// WEEKLY 20170728T080000Z MO,TU,WE,TH,FR
Live demo
Be more general
Using extract function:
preg_match_all('~(\w+)=([^;]+)~', $str, $m);
$m[1] = array_map('strtolower', $m[1]);
$vars = array_combine($m[1], $m[2]);
extract($vars);
echo $freq, " ", $until, " ", $byday;
Live demo
Notice: For this problem, I recommend the generell approach #revo posted, it's concise and safe and easy on the eyes -- but keep in mind, that regular expressions come with a performance penalty compared to fixed string functions, so if you can use strpos/substr/explode/..., try to use them, don't 'knee-jerk' to a preg_-based solution.
Since the seperators are fixed and don't seem to occur in the values your are interested in, and you furthermore rely on knowledge of the keys (FREQ:, etc) you don't need regular-expressions (as much as I like to use them anywhere I can, and you can use them here); why not simply explode and split in this case?
$lines = explode("\n", $text);
foreach($lines as $line) {
$parts = explode(';', $line);
$frequency = $until = $day = $interval = null;
foreach($parts as $part) {
list($key, $value) = explode('=', $part);
switch($key) {
case 'FREQ':
$frequency = $value;
break;
case 'INTERVAL':
$interval = $value;
break;
// and so on
}
}
doSomethingWithTheValues();
}
This may be more readable and efficient if your use-case is as simple as stated.
You need to use the Pattern
;?[A-Z]+=
together with preg_split();
preg_split('/;?[A-Z]+=/', $str);
Explanation
; match Semikolon
? no or one of the last Character
[A-Z]+ match one or more uppercase Letters
= match one =
If you want to have each Line into a seperate Array, you should do it this Way:
# split each Line into an Array-Element
$lines = preg_split('/[\n\r]+/', $str);
# initiate Array for Results
$results = array();
# start Looping trough Lines
foreach($lines as $line){
# split each Line by the Regex mentioned above and
# put the resulting Array into the Results-Array
$results[] = preg_split('/;?[A-Z]+=/', $line);
}
This question already has answers here:
PHP: How can I explode a string by commas, but not wheres the commas are within quotes?
(2 answers)
Closed 8 years ago.
I'm trying to figure out how to add double quote between text which separates by a comma.
e.g. I have a string
$string = "starbucks, KFC, McDonalds";
I would like to convert it to
$string = '"starbucks", "KFC", "McDonalds"';
by passing $string to a function. Thanks!
EDIT: For some people who don't get it...
I ran this code
$result = mysql_query('SELECT * FROM test WHERE id= 1');
$result = mysql_fetch_array($result);
echo ' $result['testing']';
This returns the strings I mentioned above...
Firstly, make your string a proper string as what you've supplied isn't. (pointed out by that cutey Fred -ii-).
$string = 'starbucks, KFC, McDonalds';
$parts = explode(', ', $string);
As you can see the explode sets an array $parts with each name option. And the below foreach loops and adds your " around the names.
$d = array();
foreach ($parts as $name) {
$d[] = '"' . $name . '"';
}
$d Returns:
"starbucks", "KFC", "McDonalds"
probably not the quickest way of doing it, but does do as you requested.
As this.lau_ pointed out, its most definitely a duplicate.
And if you want a simple option, go with felipsmartins answer :-)
It should work like a charm:
$parts = split(', ', 'starbucks, KFC, McDonalds');
echo('"' . join('", "', $parts) . '"');
Note: As it has noticed in the comments (thanks, nodeffect), "split" function has been DEPRECATED as of PHP 5.3.0. Use "explode", instead.
Here is the basic function, without any checks (i.e. $arr should be an array in array_map and implode functions, $str should be a string, not an array in explode function):
function get_quoted_string($str) {
// Here you will get an array of words delimited by comma with space
$arr = explode (', ', $str);
// Wrapping each array element with quotes
$arr = array_map(function($x){ return '"'.$x.'"'; }, $arr);
// Returning string delimited by comma with space
return implode(', ', $arr);
}
Came in my mind a really nasty way to do it. explode() on comma, foreach value, value = '"' . $value . '"';, then run implode(), if you need it as a single value.
And you're sure that's not an array? Because that's weird.
But here's a way to do it, I suppose...
$string = "starbucks, KFC, McDonalds";
// Use str_replace to replace each comma with a comma surrounded by double-quotes.
// And then shove a double-quote on the beginning and end
// Remember to escape your double quotes...
$newstring = "\"".str_replace(", ", "\",\"", $string)."\"";
here's the line of code that I came up with:
function Count($text)
{
$WordCount = str_word_count($text);
$TextToArray = explode(" ", $text);
$TextToArray2 = explode(" ", $text);
for($i=0; $i<$WordCount; $i++)
{
$count = substr_count($TextToArray2[$i], $text);
}
echo "Number of {$TextToArray2[$i]} is {$count}";
}
So, what's gonna happen here is that, the user will be entering a text, sentence or paragraph. By using substr_count, I would like to know the number of occurrences of the word inside the array. Unfortunately, the output the is not what I really need. Any suggestions?
I assume that you want an array with the word frequencies.
First off, convert the string to lowercase and remove all punctuation from the text. This way you won't get entries for "But", "but", and "but," but rather just "but" with 3 or more uses.
Second, use str_word_count with a second argument of 2 as Mark Baker says to get a list of words in the text. This will probably be more efficient than my suggestion of preg_split.
Then walk the array and increment the value of the word by one.
foreach($words as $word)
$output[$word] = isset($output[$word]) ? $output[$word] + 1 : 1;
If I had understood your question correctly this should also solve your problem
function Count($text) {
$TextToArray = explode(" ", $text); // get all space separated words
foreach($TextToArray as $needle) {
$count = substr_count($text, $needle); // Get count of a word in the whole text
echo "$needle has occured $count times in the text";
}
}
$WordCounts = array_count_values(str_word_count(strtolower($text),2));
var_dump($WordCounts);
I have a small problem. I am tryng to convert a string like "1 234" to a number:1234
I cant't get there. The string is scraped fro a website. It is possible not to be a space there? Because I've tried methods like str_replace and preg_split for space and nothing. Also (int)$abc takes only the first digit(1).
If anyone has an ideea, I'd be greatefull! Thank you!
This is how I would handle it...
<?php
$string = "Here! is some text, and numbers 12 345, and symbols !£$%^&";
$new_string = preg_replace("/[^0-9]/", "", $string);
echo $new_string // Returns 12345
?>
intval(preg_replace('/[^0-9]/', '', $input))
Scraping websites always requires specific code, you know how you receive the input - and you write code that is required to make it usable.
That is why first answer is still str_replace.
$iInt = (int)str_replace(array(" ", ".", ","), "", $iInt);
$str = "1 234";
$int = intval(str_replace(' ', '', $str)); //1234
I've just came into the same issue, however the answer that was provided wasn't covering all the different cases I had...
So I made this function (the idea popped in my mind thanks to Dan) :
function customCastStringToNumber($stringContainingNumbers, $decimalSeparator = ".", $thousandsSeparator = " "){
$numericValues = $matches = $result = array();
$regExp = null;
$decimalSeparator = preg_quote($decimalSeparator);
$regExp = "/[^0-9$decimalSeparator]/";
preg_match_all("/[0-9]([0-9$thousandsSeparator]*)[0-9]($decimalSeparator)?([0-9]*)/", $stringContainingNumbers, $matches);
if(!empty($matches))
$matches = $matches[0];
foreach($matches as $match):
$numericValues[] = (float)str_replace(",", ".", preg_replace($regExp, "", $match));
endforeach;
$result = $numericValues;
if(count($numericValues) === 1)
$result = $numericValues[0];
return $result;
}
So, basically, this function extracts all the numbers contained inside of a string, no matter how many text there is, identifies the decimal separator and returns every extracted number as a float.
One can specify what decimal separator is used in one's country with the $decimalSeparator parameter.
Use this code for removing any other characters like .,:"'\/, !##$%^&*(), a-z, A-Z :
$string = "This string involves numbers like 12 3435 and 12.356 and other symbols like !## then the output will be just an integer number!";
$output = intval(preg_replace('/[^0-9]/', '', $string));
var_dump($output);
I've got a list of names separated by commas (they may contain other characters), or be empty, but generally looking like this:
NameA,NameB,NameC
I need to create a function to delete a name if its present in the list and which restores the comma separated structure.
eg: if NameA is to be deleted, I should end up with:
NameB,NameC
NOT
,NameB,NameC
Similarly for the rest.
This is what I came up with, is there a better solution?
$pieces = explode(",", $list);
$key=array_search($deleteuser, $pieces);
if(FALSE !== $key)
{
unset($pieces[$key]);
}
$list = implode(",", $pieces);
That should work pretty well. You may also be interested in PHP's fgetcsv function.
Doc: http://php.net/manual/en/function.fgetcsv.php
You could use the array_splice function to delete from the array. With offset = array_search($deleteuser, $pieces) and length = 1.
If your application data is so large that you are experiencing a crippling amount of lag, then you may have bigger issues that this snippet of code. It might be time to rethink your data storage and processing from the ground up.
In the meantime, here are some benchmarks when imploding a 5001 element array with commas, then using different techniques to remove value 4999. (speeds actually include the generation of the comma-separate string, but all benchmarks are identical in this regard)
explode() + array_search() + unset() + implode() (demo)System time: ~.009 - .011s
$pieces = explode(",", $list);
if (($key = array_search($deleteuser, $pieces)) !== false) {
unset($pieces[$key]);
}
echo implode(",", $pieces);
explode() + array_search() + array_splice() + implode() (demo)System time: ~.010 - .012s
$pieces = explode(",", $list);
if (($key = array_search($deleteuser, $pieces)) !== false) {
array_splice($pieces, $key, 1);
}
echo implode(",", $pieces);
explode() + foreach() + if() + unset() + break + implode() (demo)System time: ~.011 - .012s
$pieces = explode(",", $list);
foreach ($pieces as $key => $value) {
if ($value == $deleteuser) {
unset($pieces[$key]);
break;
}
}
echo implode(",", $pieces);
(note: if you remove the break, the loop can remove multiple occurrences of the needle)
explode() + array_diff() + implode() (demo)System time: ~.010 - .011s
$pieces = explode(",", $list);
$pieces = array_diff($pieces, [$deleteuser]);
echo implode(",", $pieces);
// or just: echo implode(',', array_diff(explode(",", $list), [$deleteuser]);
explode() + array_filter() + implode() (demo)System time: ~.010 - .013s
$pieces = explode(",", $list);
$pieces = array_filter($pieces, function($v) use ($deleteuser) {
return $v != $deleteuser;
});
echo implode(",", $pieces);
preg_quote() + preg_replace() (demo) (regex demo)System time: ~.007 - .010s
$needle = preg_quote($deleteuser, '/');
echo preg_replace('/,' . $needle . '(?=,|$)|^' . $needle . ',/', '', $list);
// if replacing only one value, declare preg_replace's 4th parameter as 1
Note that if you are using a delimiting character that has a special meaning to the regex engine (like +), then you will need to add a slash before it \+ to escape it and make it literal. This would make the pattern: '/\+' . $needle . '(?=\+|$)|^' . $needle . '\+/'
So while the regex-based snippet proved to be slightly (unnoticeably) faster for my arbitrarily conjured string length, you will need to make your own benchmarks to be sure which is the best performer for your application.
That said, and don't get me wrong I love regex, but the regular expression is going to be the easiest for developers to "get wrong" when it is time to modify the pattern AND I am confident that most devs will agree it has the worst overall comprehensibility.
You could also try a regular expression like this (maybe it can be optimized):
$search = 'NameB';
$list = 'NameA,NameB,NameC';
$list = preg_match('/(^' . $search . ',)|(,' . $search. ')/', '', $list);