PHP - How to avoid replacing a replacement string - php

I'm working on a script that allows students to input their answers into a form and gives them instant feedback on their answers.
I start with a string ($content) that contains the complete task with the gaps in square brackets, something like this:
There's [somebody] in the room. There isn't [anybody] in the room.
Is [anybody] in the room?
Now the script recognizes the solutions (somebody, anybody, anybody) and saves them in an array. The student's answers are also in an array.
To see if the answer is correct, the script checks if $input[$i] and $solution[$i] are identical.
Now here is the problem: I want the script to replace the placeholders with an input box where the solution is wrong and the solution in green where it's correct. This updated version of $content is then shown on the next page.
But if there are two identical solutions, this results in multiple replacements as the replacement is replaced again...
I tried preg_replace with a limit of 1 but this doesn't do the trick either as it doesn't skip solutions that have already been replaced.
$i=0;
while ($solution[$i]){
//answer correct
if($solution[$i] == $input[$i]){
//replace placeholder > green solution
$content = str_replace($solution[$i], $solution_green[$i], $content);
}
//answer wrong
else{
//replace placeholder > input box to try again
$content = str_replace($solution[$i], $solution_box[$i], $content);
}
$i++;
}
print $content; //Output new form based on student's answer
Is there any way to avoid replacing replacements?
I hope I didn't ramble too much... Have been wracking my brain for ages over this problem and would appreciate any suggestions.

The way I've approached this is to split the original content into segments which relate to the markers in the text. So then you explode() the original text by ], you end up with...
Array
(
[0] => There's [somebody
[1] => in the room. There isn't [anybody
[2] => in the room.
Is [anybody
[3] => in the room?
)
As you can see, each array element now corresponds with the answer/solution numbers. So when replacing the text, it changes $parts[$i] instead. Also as a safeguard, it replaces [text to make sure, there are other solutions, but this should do the job.
At the end, the code then rebuilds the original content by using implode() and using ] to add it back.
$parts = explode("]", $content);
$i=0;
while (isset($solution[$i])){
//answer correct
if($solution[$i] == $input[$i]){
//replace placeholder > green solution
$parts[$i] = str_replace("[".$solution[$i], "[".$solution_green[$i], $parts[$i]);
}
//answer wrong
else{
//replace placeholder > input box to try again
$parts[$i] = str_replace("[".$solution[$i], "[".$solution_box[$i], $parts[$i]);
}
$i++;
}
$content = implode( "]", $parts);

You may use sprintf()/vsrpintf() function to replace positional placeholders, but first you'd have to prepare sentence pattern for it. Each "solution placeholder" should be replaced with %s, so that later sprintf() could replace each one with corresponding string.
You may do that within loop:
$fields = [];
while (isset($solution[$i])) {
$fields[] = ($solution[$i] === $input[$i])
? $solution_green[$i]
: $solution_box[$i];
//doesn't matter if you replace more than one here
$content = str_replace($solution[$i], '%s', $content);
$i++;
}
print vsprintf($content, $fields);
//or for php>=5.6: sprintf($content, ...$fields);
This is an easy-fix solution to current state of your code. It may be refactored (pattern replacement while parsing correct words, green/box arrays may be replaced with methods producing string you need... etc.)

Related

How do I check the amount of times each array object appears in a string, and then save that into a seperate array?

Basically what I am trying to do here is get a text input (a paragraph), and then save each word into an array. Then I want to check each word in the array against the original paragraph to see how many times it occurred. By doing this I am hopefully going to be able to check what the topic is. Originally I started this is as an open ended school project, but I am more interested in finding out how to do this for my own sanity.
Here is my code (this is after I requested the text input in html code above):
$paragraph = $_POST['text'];
$paragraph = str_replace(' ',' ',$paragraph);
$paragraph = str_replace(' ',' ',$paragraph);
$paragraph = strtolower($paragraph);
$words = explode(" ",$paragraph);
$count = count($words);
for($x = 0; $x < $count; $x++) {
echo $words[$x];
echo "<br/>";
}
So far I have been able to get the words all lowercase and to replace all the extra spaces in my text, and then subsequently save that to an array. For now I am just displaying the words.
This is where I have run into some problems. I was thinking I could have a multidimensional array where it would be something along the lines of
$words[1]["word"][0]["amount"];
The word would be the actual word in the paragraph, and amount would count how many times it showed up in the paragraph. If anyone has basic concepts for doing this, or there is something I am missing here I would appreciate your help. The main thing I need help with is checking the amount of times each word shows up in the paragraph. I couldn't get this to work (it was within the prior for loop):
substr_count($words[$x],$paragraph)
To recap, I am trying to take a paragraph, save each different word into an array (I have managed to do this successfully) and then save the amount of times the word shows up in the paragraph into a different array (or a multidimensional array). Once I get this data I am going to see which words I used the most, while filtering out filler words like "the" and "a".
You would be better off using preg_replace('/\W+/', ' ', $paragraph); and simplifying the rest of your code to this:
$paragraph = preg_replace('/\W+/', ' ', $paragraph);
$filter = array('the', 'a');
$words = explode(' ',$paragraph);
$countWords = array();
foreach($words as $w)
{
if(trim($w) != "" && array_search($w, $filter) === false)
{
if(!isset($countWords[$w]))
$countWords[$w] = 0;
$countWords[$w] += 1;
}
}
This will give you how many times each word is used. And if you don't care about case, then you can use $countWords[strtolower($w)] instead. Also, with the $filter array I added, you can add whatever words that you don't want to count in there.

Efficient way to parse this string into array in PHP?

Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.
Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.
I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.
Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.

How to remove commas between double quotes in PHP

Hopefully, this is an easy one. I have an array with lines that contain output from a CSV file. What I need to do is simply remove any commas that appear between double-quotes.
I'm stumbling through regular expressions and having trouble. Here's my sad-looking code:
<?php
$csv_input = '"herp","derp","hey, get rid of these commas, man",1234';
$pattern = '(?<=\")/\,/(?=\")'; //this doesn't work
$revised_input = preg_replace ( $pattern , '' , $csv_input);
echo $revised_input;
//would like revised input to echo: "herp","derp,"hey get rid of these commas man",1234
?>
Thanks VERY much, everyone.
Original Answer
You can use str_getcsv() for this as it is purposely designed for process CSV strings:
$out = array();
$array = str_getcsv($csv_input);
foreach($array as $item) {
$out[] = str_replace(',', '', $item);
}
$out is now an array of elements without any commas in them, which you can then just implode as the quotes will no longer be required once the commas are removed:
$revised_input = implode(',', $out);
Update for comments
If the quotes are important to you then you can just add them back in like so:
$revised_input = '"' . implode('","', $out) . '"';
Another option is to use one of the str_putcsv() (not a standard PHP function) implementations floating about out there on the web such as this one.
This is a very naive approach that will work only if 'valid' commas are those that are between quotes with nothing else but maybe whitespace between.
<?php
$csv_input = '"herp","derp","hey, get rid of these commas, man",1234';
$pattern = '/([^"])\,([^"])/'; //this doesn't work
$revised_input = preg_replace ( $pattern , "$1$2" , $csv_input);
echo $revised_input;
//ouput for this is: "herp","derp","hey get rid of these commas man",1234
It should def be tested more but it works in this case.
Cases where it might not work is where you don't have quotes in the string.
one,two,three,four -> onetwothreefour
EDIT : Corrected the issues with deleting spaces and neighboring letters.
Well, I haven't been lazy and written a small function to do exactly what you need:
function clean_csv_commas($csv){
$len = strlen($csv);
$inside_block = FALSE;
$out='';
for($i=0;$i<$len;$i++){
if($csv[$i]=='"'){
if($inside_block){
$inside_block=FALSE;
}else{
$inside_block=TRUE;
}
}
if($csv[$i]==',' && $inside_block){
// do nothing
}else{
$out.=$csv[$i];
}
}
return $out;
}
You might be coming at this from the wrong angle.
Instead of removing the commas from the text (presumably so you can then split the string on the commas to get the separate elements), how about writing something that works on the quotes?
Once you've found an opening quote, you can check the rest of the string; anything before the next quote is part of this element. You can add some checking here to look for escaped quotes, too, so things like:
"this is a \"quote\""
will still be read properly.
Not exactly an answer you've been looking for - But I've used it for cleaning commas in numbers in CSV.
$csv = preg_replace('%\"([^\"]*)(,)([^\"]*)\"%i','$1$3',$csv);
"3,120", 123, 345, 567 ==> 3120, 123, 345, 567

Temporarily remove labels/tags and re-insert them later on

Consider the following string
$input = "string with {LABELS} between brackets {HERE} and {HERE}";
I want to temporarily remove all labels (= whatever is between curly braces) so that an operation can be performed on the rest of the string:
$string = "string with between brackets and";
For arguments sake, the operation is concatenate every word that starts with 'b' with the word 'yes'.
function operate($string) {
$words = explode(' ', $string);
foreach ($words as $word) {
$output[] = (strpos($word, 0, 1) == 'b') ? "yes$word" : $word;
}
return implode(' ', $output);
}
The output of this function would be
"string with yesbetween yesbrackets and"
Now I want to insert the temporarily deleted labels back into place:
"string with {LABELS} yesbetween yesbrackets {HERE} and {HERE}"
My question is: how can I accomplish this? Important: I am not able to alter operate(), so the solution should contain a wrapper function around operate() or something. I have been thinking about this for quite a while now, but am confused as to how to do this. Could you help me out?
Edit: it would be too much to put the actual operate() in this post. It will not really add value (except make the post longer). There is not much difference between the output of operate() here and the real one. I will be able to translate any ideas from here, to the real-world situation :-)
The answer to this depends on wether or not you are able to understand operate(), even if you can't change it.
If you have absolutely no insight into operate(), your problem is simply unsolvable: To reinsert your labels you need one of
Their offset or relative position (You can't know them, if you don't know operate())
A marker for their place (You can't have them, if you don't know how operate() will work on them)
If you have at least some insight into operate(), this becomes something between solvable and easy:
If operate($a . $b)==operate($a) . operate($b), then you just split your original input by the labels, run the non-label parts through operate(), but obviously not the labels, then reassemble
If operate() is guaranteed to let a placeholder string, that itself is guaranteed to be not part of the normal input ("\0" and friends come to mind) alone, then you extract your labels in order, replace them by the placeholder, run the result through operate() and later replace the placeholder by your saved labels (in order)
Edit
After reading your comments, here are some lines of code
$input = "string with {LABELS} between brackets {HERE} and {HERE}";
//Extract labels and replace with \0
$tmp=preg_split('/(\{.*?\})/',$input,-1,PREG_SPLIT_DELIM_CAPTURE);
$labels=array();
$txt=array();
$islabel=false;
foreach ($tmp as $t) {
if ($islabel) $labels[]=$t;
else $txt[]=$t;
$islabel=!$islabel;
}
$txt=implode("\0",$txt);
//Run through operate()
$txt=operate($txt);
//Reasssemble
$txt=explode("\0",$txt);
$result='';
foreach ($txt as $t)
$result.=$t.array_shift($labels);
echo $result;
Here's what I would do as a first attempt. Split your string into single words, then feed them into operate() one by one, depending on whether the word is 'braced' or not.
$input = "string with {LABELS} between brackets {HERE} and {HERE}";
$inputArray = explode(' ',$input);
foreach($inputArray as $key => $value) {
if(!preg_match('/^{.*}$/',$value)) {
$inputArray[$key] = operate($value);
}
}
$output = implode(' ',$inputArray);

Using regex to fix phone numbers in a CSV with PHP

My new phone does not recognize a phone number unless its area code matches the incoming call. Since I live in Idaho where an area code is not needed for in-state calls, many of my contacts were saved without an area code. Since I have thousands of contacts stored in my phone, it would not be practical to manually update them. I decided to write the following PHP script to handle the problem. It seems to work well, except that I'm finding duplicate area codes at the beginning of random contacts.
<?php
//the script can take a while to complete
set_time_limit(200);
function validate_area_code($number) {
//digits are taken one by one out of $number, and insert in to $numString
$numString = "";
for ($i = 0; $i < strlen($number); $i++) {
$curr = substr($number,$i,1);
//only copy from $number to $numString when the character is numeric
if (is_numeric($curr)) {
$numString = $numString . $curr;
}
}
//add area code "208" to the beginning of any phone number of length 7
if (strlen($numString) == 7) {
return "208" . $numString;
//remove country code (none of the contacts are outside the U.S.)
} else if (strlen($numString) == 11) {
return preg_replace("/^1/","",$numString);
} else {
return $numString;
}
}
//matches any phone number in the csv
$pattern = "/((1? ?\(?[2-9]\d\d\)? *)? ?\d\d\d-?\d\d\d\d)/";
$csv = file_get_contents("contacts2.CSV");
preg_match_all($pattern,$csv,$matches);
foreach ($matches[0] as $key1 => $value) {
/*create a pattern that matches the specific phone number by adding slashes before possible special characters*/
$pattern = preg_replace("/\(|\)|\-/","\\\\$0",$value);
//create the replacement phone number
$replacement = validate_area_code($value);
//add delimeters
$pattern = "/" . $pattern . "/";
$csv = preg_replace($pattern,$replacement,$csv);
}
echo $csv;
?>
Is there a better approach to modifying the CSV? Also, is there a way to minimize the number of passes over the CSV? In the script above, preg_replace is called thousands of times on a very large String.
If I understand you correctly, you just need to prepend the area code to any 7-digit phone number anywhere in this file, right? I have no idea what kind of system you're on, but if you have some decent tools, here are a couple options. And of course, the approaches they take can presumably be implemented in PHP; that's just not one of my languages.
So, how about a sed one-liner? Just look for 7-digit phone numbers, bounded by either beginning of line or comma on the left, and comma or end of line on the right.
sed -r 's/(^|,)([0-9]{3}-[0-9]{4})(,|$)/\1208-\2\3/g' contacts.csv
Or if you want to only apply it to certain fields, perl (or awk) would be easier. Suppose it's the second field:
perl -F, -ane '$"=","; $F[1]=~s/^[0-9]{3}-[0-9]{4}$/208-$&/; print "#F";' contacts.csv
The -F, indicates the field separator, the $" is the output field separator (yes, it gets assigned once per loop, oh well), the arrays are zero-indexed so second field is $F[1], there's a run-of-the-mill substitution, and you print the results.
Ah programs... sometimes a 10-min hack is better.
If it were me... I'd import the CSV into Excel, sort it by something - maybe the length of the phone number or something. Make a new col for the fixed phone number. When you have a group of similarly-fouled numbers, make a formula to fix. Same for the next group. Should be pretty quick, no? Then export to .csv again, omitting the bad col.
A little more digging on my own revealed the issues with the regex in my question. The problem is with duplicate contacts in the csv.
Example:
(208) 555-5555, 555-5555
After the first pass becomes:
2085555555, 208555555
and After the second pass becomes
2082085555555, 2082085555555
I worked around this by changing the replacement regex to:
//add escapes for special characters
$pattern = preg_replace("/\(|\)|\-|\./","\\\\$0",$value);
//add delimiters, and optional area code
$pattern = "/(\(?[0-9]{3}\)?)? ?" . $pattern . "/";

Categories