CSV Import Split by Comma - what to do about quotes?

CSV Import Split by Comma - what to do about quotes? - php

I have a CSV file I'm importing but am running into an issue. The data is in the format:
TEST 690, "This is a test 1, 2 and 3" ,$14.95 ,4
I need to be able to explode by the , that are not within the quotes...

See the fgetcsv function.
If you already have a string, you can create a stream that wraps it and then use fgetcsv. See http://code.google.com/p/phpstringstream/source/browse/trunk/stringstream.php

If you really want to do this by hand, here's a rough reference implementation I wrote to explode a complete line of CSV text into an array. Be warned: This code does NOT handle multiple-line fields! With this implementation, the entire CSV row must exist on a single line with no line breaks!
<?php
//-----------------------------------------------------------------------
function csvexplode($str, $delim = ',', $qual = "\"")
// Explode a single CSV string (line) into an array.
{
$len = strlen($str); // Store the complete length of the string for easy reference.
$inside = false; // Maintain state when we're inside quoted elements.
$lastWasDelim = false; // Maintain state if we just started a new element.
$word = ''; // Accumulator for current element.
for($i = 0; $i < $len; ++$i)
{
// We're outside a quoted element, and the current char is a field delimiter.
if(!$inside && $str[$i]==$delim)
{
$out[] = $word;
$word = '';
$lastWasDelim = true;
}
// We're inside a quoted element, the current char is a qualifier, and the next char is a qualifier.
elseif($inside && $str[$i]==$qual && ($i<$len && $str[$i+1]==$qual))
{
$word .= $qual; // Add one qual into the element,
++$i; // Then skip ahead to the next non-qual char.
}
// The current char is a qualifier (so we're either entering or leaving a quoted element.)
elseif ($str[$i] == $qual)
{
$inside = !$inside;
}
// We're outside a quoted element, the current char is whitespace and the 'last' char was a delimiter.
elseif( !$inside && ($str[$i]==" ") && $lastWasDelim)
{
// Just skip the char because it's leading whitespace in front of an element.
}
// Outside a quoted element, the current char is whitespace, the "next" char is a delimiter.
elseif(!$inside && ($str[$i]==" ") )
{
// Look ahead for the next non-whitespace char.
$lookAhead = $i+1;
while(($lookAhead < $len) && ($str[$lookAhead] == " "))
{
++$lookAhead;
}
// If the next char is formatting, we're dealing with trailing whitespace.
if($str[$lookAhead] == $delim || $str[$lookAhead] == $qual)
{
$i = $lookAhead-1; // Jump the pointer ahead to right before the delimiter or qualifier.
}
// Otherwise we're still in the middle of an element, so add the whitespace to the output.
else
{
$word .= $str[$i];
}
}
// If all else fails, add the character to the current element.
else
{
$word .= $str[$i];
$lastWasDelim = false;
}
}
$out[] = $word;
return $out;
}
// Examples:
$csvInput = 'Name,Address,Phone
Alice,123 First Street,"555-555-5555"
Bob,"345 Second Place, City ST",666-666-6666
"Charlie ""Chuck"" Doe", 3rd Circle ," 777-777-7777"';
// explode() emulates file() in this context.
foreach(explode("\n", $csvInput) as $line)
{
var_dump(csvexplode($line));
}
?>
I would still recommend relying on PHP's built-in function though. That's (hopefully) going to be far more reliable long term. Artefacto and Roadmaster are right.: anything you have to do to the data is best done after you import it.

Related

Simulating leftTrim function in PHP

Here I am trying to simulate built in function "ltrim" (that strips characters from the beginning of the string) in PHP
here is the code I have written
function leftTrim(string $Sstr, mixed $char) : string
{
$i = 0;
for(; $i< strlen($Sstr); $i++): // loop on the string
for($j=0; $j <strlen($char); $j++): // loop on the characters
if($Sstr[$i] === $char[$j])
{
$Sstr[$i] = " ";
break;
}
endfor;
endfor;
return $Sstr;
}
but I found two problems with this code:
it replaces each character with a space character that means
that the length of the code remains the same.
this function trims characters from the middle of the string
where it should only trim from the beginning.

Don't replace the characters with spaces. Just find the index of the first non-matching character, then use substr() to get all the characters after that into the result.
Stop looping when you find a character that isn't in $char.
There's no need for the inner loop, you can use strpos() to check if a character is in $char.
function leftTrim(string $Sstr, mixed $char) : string
{
for($i = 0; $i< strlen($Sstr); $i++): // loop on the string
if (strpos($char, $Sstr[$i]) === false) {
break;
}
endfor;
return substr($Sstr, $i);
}

Barmar's answer is the correct one (by correcting and improving the OPs code).
One alternative would be to split the string to an array and unset the characters to be trimmed and then join the remaining characters back to a string:
function leftTrim(string $str, string $chars = ' '): string {
$splitted = str_split($str);
for ($i = 0; $i < count($splitted); $i++) {
if (str_contains($chars, $splitted[$i])) {
unset($splitted[$i]);
} else {
break;
}
}
return implode('', $splitted);
}

Reverse the position of all letters in each word of a string that may contain multibyte characters

I want to write an application that reverses all the words of input text, but all non-letter symbols should stay in the same places.
What I already have:
function reverse($string)
{
$reversedString = '';
for ($position = strlen($string); $position > 0; $position--) {
$reversedString .= $string[$position - 1]; //.= - concatenation assignment, привязывает b to a;
}
return $reversedString;
}
$name = 'ab1 ab2';
print_r(reverse($name)); //output: 2ba 1ba;
Now I want to add for this function some conclusion, that this function also will reverse text, but without affecting any special characters? It means, that all non-letter symbols should stay in the same places.
Here are some sample input strings and my desired output:
ab1 ab2 becomes ba1 ba2
qwerty uiop becomes ytrewq poiu
q1werty% uio*pl becomes y1trewq% lpo*iu
Привет, мир! becomes тевирП, рим!
Hello, dear #user_non-name, congrats100 points*#! becomes olleH, raed #eman_non-resu, stragnoc100 stniop*#!
My actual project will be using cyrillic characters, so answers need to accommodate multibyte/unicode letters.
Maybe I should use array and '''ctype_alpha''' function?

If I understood your problem correctly, then the solution below will probably be able to help you. This solution is not neat and not optimal, but it seems to work:
//mb_internal_encoding('UTF-8') && mb_regex_encoding('UTF-8'); // <-- if you need
function reverse(string $string): string
{
$reversedStrings = explode(' ', $string);
$patternRegexp = '^[a-zA-Zа-яА-Я]+$';
foreach ($reversedStrings as &$word) {
$chars = mb_str_split($word, 1);
$filteredChars = [];
foreach (array_reverse($chars) as $char) {
if (mb_ereg_match($patternRegexp, $char)) {
$filteredChars[] = $char;
}
}
foreach ($chars as &$char) {
if (!mb_ereg_match($patternRegexp, $char)) {
continue;
}
$char = array_shift($filteredChars);
}
$word = implode('', $chars);
}
return implode(' ', $reversedStrings);
}
$test1 = 'ab1 ab2 ab! ab. Hello!789World! qwe4rty5';
print_r(reverse($test1)."\n"); // => "ba1 ba2 ba! ba. dlroW!789olleH! ytr4ewq5";
$test2 = 'Привет, мир!';
print_r(reverse($test2)."\n"); // => "тевирП, рим!";
In the example, non-alphabetic characters do not change their position within the word. Example: "Hello!789World! qwe4rty5" => "dlroW!789olleH! ytr4ewq5".

In an attempt to optimize for performance by reducing total function calls and reducing the number of preg_ calls, I'll demonstrate a technique with nested loops.
Explode on spaces
Separate letters from non-letters while accommodating multi-byte characters
Iterate the matches array which contains 1 character per row (separated so that letters (movable characters) are in the [1] element and non-letters (immovable characters are in the [2] element.
While traversing from left to right along the array of characters, if an immovable character is encountered, immediately append it to the current output string; if movable, seek out the latest unused movable character from the end of the matches array.
Code: (Demo) (variation without isset() calls)
$names = [
'ab1 ab2', // becomes ba1 ba2
'qwerty uçop', // becomes ytrewq poçu
'q1werty% uio*pl', // becomes y1trewq% lpo*iu
'Привет, мир!', // becomes тевирП, рим!
'Hello, dear #user_non-name, congrats100 points*#!', // olleH, raed #eman_non-resu, stragnoc100 stniop*#!
'a' // remains a
];
function swapLetterPositions($string): string {
$result = [];
foreach (explode(' ', $string) as $index => $word) {
$result[$index] = '';
$count = preg_match_all('/(\pL)|(.)/u', $word, $m, PREG_SET_ORDER);
for ($i = 0, $j = $count; $i < $count; ++$i) {
if (isset($m[$i][2])) { // immovable
$result[$index] .= $m[$i][2]; // append to string
} else { // movable from front
while (--$j >= 0) { // decrement $j and ensure that it represents an element index
if (!isset($m[$j][2])) { // movable from back
$result[$index] .= $m[$j][1]; // append to string
break;
}
}
}
}
}
return implode(' ', $result);
}
foreach ($names as $name) {
echo "\"$name\" => \"" . swapLetterPositions($name) . "\"\n---\n";
}
Output:
"ab1 ab2" => "ba1 ba2"
---
"qwerty uçop" => "ytrewq poçu"
---
"q1werty% uio*pl" => "y1trewq% lpo*iu"
---
"Привет, мир!" => "тевирП, рим!"
---
"Hello, dear #user_non-name, congrats100 points*#!" => "olleH, raed #eman_non-resu, stargnoc100 stniop*#!"
---
"a" => "a"
---

Search through array and removing duplicates

I am trying to remove duplicates from a text field. The text field auto suggests inputs and the user is only allowed to choose from them.
The user however has the option of choosing same input field more than once. It's an input fields that states the firstname + lastname of each individual from a database.
First, this is my code to trim some of the unwated characters and then going through the array comparing it to previous inputs.
if(!empty($_POST['textarea'])){
$text = $_POST['textarea'];
$text= ltrim ($text,'[');
$text= rtrim ($text,']');
$toReplace = ('"');
$replaceWith = ('');
$output = str_replace ($toReplace,$replaceWith,$text);
$noOfCommas = substr_count($output, ",");
echo $output.'<br>';
$tempArray = (explode(",",$output));
$finalArray[0] = $tempArray[0];
$i=0;
$j=0;
$foundMatch=0;
for ($i; $i<$noOfCommas; $i++) {
$maxJ = count($finalArray);
for ($j; $j<$maxJ; $j++) {
if ($tempArray[$i] === $finalArray[$j]) {
$foundMatch ===1;
}
}
if ($foundMatch === 0) {
array_push($finalArray[$j],$tempArray[$i]);
}
}
What is it am I doing wrong ?

In this part when checking if the values are equal:
if ($tempArray[$i] === $finalArray[$j]) {
$foundMatch ===1;
}
It should be:
if ($tempArray[$i] === $finalArray[$j]) {
$foundMatch = 1;
}
That way you are setting the variable and not checking if it's equal to 1. You can also break the inner for loop when finding the first match.

I think that this should work:
if (!empty($_POST['textarea'])){
$words = explode(',',str_replace('"', '', trim($_POST['textarea'], ' \t\n\r\0\x0B[]'));
array_walk($words, 'trim');
foreach ($words as $pos=>$word){
$temp = $words;
unset($temp[$pos]);
if (in_array($word, $temp))
unset($words[$pos]);
}
}
echo implode("\n", $words);
First it reads all the words from textarea, removes '"' and then trim. After that it creates a list of words(explode) followed by a trim for every word.
Then it checks every word from the list to see if it exists in that array (except for that pos). If it exists then it will remove it (unset).

PHP: Check string for certain words

How can I check if data submitted from a form or querystring has certain words in it?
I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.
I'm converting from ASP to PHP. I used to do this using an array in ASP (keep all illegal words in a string and use ubound to check the whole string for those words), but is there a better (efficient) way to do this in PHP?
Eg: A string like this would be rejected: "The administrator dropped a blah blah" because it has admin and drop in it.
I intend using this to check usernames when creating accounts and for other things too.
Thanks

You could use stripos()
int stripos ( string $haystack , string $needle [, int $offset = 0 ] )
You could have a function like:
function checkBadWords($str, $badwords) {
foreach ($badwords as $word) {
if (stripos(" $str ", " $word ") !== false) {
return false;
}
}
return true;
}
And to use it:
if (!checkBadWords('something admin', array('admin')) {
// ...
}

strpos() will let you search for a substring within a larger string. It's quick and works well. It returns false if the string's not found, and a number (which could be zero, so you need to use === to check) if it finds the string.
stripos() is a case-insensitive version of the same.
I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.
I suspect that you are trying to filter the string so it's suitable for including in something like a database query, or something like that. If this is the case, this is probably not a good way to go about it, and you'd need to actually need to escape the string using mysql_real_escape_string() or equivalent.

$badwords = array("admin", "drop",);
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (strpos($word, $bw) === 0) {
//contains word $word that starts with bad word $bw
}
}
}
For JGB146, here is a performance comparison with regular expressions:
<?php
function has_bad_words($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return true;
}
}
return false;
}
}
function has_bad_words2($badwords, $string) {
$regex = array_map(function ($w) {
return "(?:\\b". preg_quote($w, "/") . ")"; }, $badwords);
$regex = "/" . implode("|", $regex) . "/";
return preg_match($regex, $string) != 0;
}
$badwords = array("abc", "def", "ghi", "jkl", "mnop");
$string = "The quick brown fox jumps over the lazy dog";
$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
has_bad_words($badwords, $string);
}
echo "elapsed: ". (microtime(true) - $start);
$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
has_bad_words2($badwords, $string);
}
echo "elapsed: ". (microtime(true) - $start);
Example output:
elapsed: 0.076514959335327
elapsed: 0.29999899864197
So regular expressions are much slower.

You could use regular expression like this:
preg_match("~(admin)|(drop)|(another token)|(yet another)~",$subject);
building the pattern string from array
$pattern = implode(")|(", $banned_words);
$pattern = "~(".$pattern.")~";

function check($string, $array) {
foreach($array as $item) {
if( preg_match("/($item)/", $string) )
return true;
}
return false;
}

You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.
Originally, I was thinking you could do this with a simple preg_match() call (hence the downvote), however preg_match does not support arrays. Instead, you can do a replacement via preg_replace to have all rejected strings replaced with nothing, and then check to see if the string is changed. This is simple and avoids requiring a loop iteration for each rejected string.
$rejectedStrs = array("/admin/", "/drop/", "/create/");
if($input == preg_replace($rejectedStrs, "", $input)) {
//do stuff
} else {
//reject
}
Note also that you can provide case-insensitive searches by using the i flag on the regex patterns, changing the array of patterns to $rejectedStrs = array("/admin/i", "/drop/i", "/create/i");
On Efficiency
There has been some debate about the efficiency of doing it this way vs the accepted nested loop method. I ran some tests and found the preg_replace method executed around twice as fast as the nested loop. Here is the code and output of those tests:
$input = "You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement. You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.";
$input = "Short string with no matches";
$input2 = "Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. ";
$input3 = "Short string which loop will match quickly";
$input4 = "Longer string that will eventually be matches but first has a lot of words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words and then finally the word create near the end";
$start1 = microtime(true);
$rejectedStrs = array("/loop/", "/operation/", "/create/");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (preg_check($rejectedStrs, $input)) $p_matches++;
if (preg_check($rejectedStrs, $input2)) $p_matches++;
if (preg_check($rejectedStrs, $input3)) $p_matches++;
if (preg_check($rejectedStrs, $input4)) $p_matches++;
}
$start2 = microtime(true);
$rejectedStrs = array("loop", "operation", "create");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (loop_check($rejectedStrs, $input)) $l_matches++;
if (loop_check($rejectedStrs, $input2)) $l_matches++;
if (loop_check($rejectedStrs, $input3)) $l_matches++;
if (loop_check($rejectedStrs, $input4)) $l_matches++;
}
$end = microtime(true);
echo "preg_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$end."=".($end-$start2);
function preg_check($rejectedStrs, $input) {
if($input == preg_replace($rejectedStrs, "", $input))
return true;
return false;
}
function loop_check($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return true;
}
}
return false;
}
}
Output:
preg_match: 1281908071.4032 1281908071.9947= 0.5915060043335
loop_match: 1281908071.9947 1281908073.006=1.0112948417664

This is actually pretty simple, use substr_count.
And example for you would be:
if (substr_count($variable_to_search, "drop"))
{
echo "error";
}
And to make things even simpler, put your keywords (ie. "drop", "create", "alter") in an array and use foreach to check them. That way you cover all your words. An example
foreach ($keywordArray as $keyword)
{
if (substr_count($variable_to_search, $keyword))
{
echo "error"; //or do whatever you want to do went you find something you don't like
}
}

How to find first non-repetitive character from a string?

I've spent half day trying to figure out this and finally I got working solution.
However, I feel like this can be done in simpler way.
I think this code is not really readable.
Problem: Find first non-repetitive character from a string.
$string = "abbcabz"
In this case, the function should output "c".
The reason I use concatenation instead of $input[index_to_remove] = ''
in order to remove character from a given string
is because if I do that, it actually just leave empty cell so that my
return value $input[0] does not not return the character I want to return.
For instance,
$str = "abc";
$str[0] = '';
echo $str;
This will output "bc"
But actually if I test,
var_dump($str);
it will give me:
string(3) "bc"
Here is my intention:
Given: input
while first char exists in substring of input {
get index_to_remove
input = chars left of index_to_remove . chars right of index_to_remove
if dupe of first char is not found from substring
remove first char from input
}
return first char of input
Code:
function find_first_non_repetitive2($input) {
while(strpos(substr($input, 1), $input[0]) !== false) {
$index_to_remove = strpos(substr($input,1), $input[0]) + 1;
$input = substr($input, 0, $index_to_remove) . substr($input, $index_to_remove + 1);
if(strpos(substr($input, 1), $input[0]) == false) {
$input = substr($input, 1);
}
}
return $input[0];
}

<?php
// In an array mapped character to frequency,
// find the first character with frequency 1.
echo array_search(1, array_count_values(str_split('abbcabz')));

Python:
def first_non_repeating(s):
for i, c in enumerate(s):
if s.find(c, i+1) < 0:
return c
return None
Same in PHP:
function find_first_non_repetitive($s)
{
for($i = 0; i < strlen($s); $i++) {
if (strpos($s, $s[i], $i+1) === FALSE)
return $s[i];
}
}

Pseudocode:
Array N;
For each letter in string
if letter not exists in array N
Add letter to array and set its count to 1
else
go to its position in array and increment its count
End for
for each position in array N
if value at potition == 1
return the letter at position and exit for loop
else
//do nothing (for clarity)
end for
Basically, you find all distinct letters in the string, and for each letter, you associate it with a count of how many of that letter exist in the string. then you return the first one that has a count of 1
The complexity of this method is O(n^2) in the worst case if using arrays. You can use an associative array to increase it's performance.

1- use a sorting algotithm like mergesort (or quicksort has better performance with small inputs)
2- then control repetetive characters
non repetetive characters will be single
repetetvives will fallow each other
Performance : sort + compare
Performance : O(n log n) + O(n) = O(n log n)
For example
$string = "abbcabz"
$string = mergesort ($string)
// $string = "aabbbcz"
Then take first char form string then compare with next one if match repetetive
move to the next different character and compare
first non-matching character is non-repetetive

This can be done in much more readable code using some standard PHP functions:
// Count number of occurrences for every character
$counts = count_chars($string);
// Keep only unique ones (yes, we use this ugly pre-PHP-5.3 syntax here, but I can live with that)
$counts = array_filter($counts, create_function('$n', 'return $n == 1;'));
// Convert to a list, then to a string containing every unique character
$chars = array_map('chr', array_keys($counts));
$chars = implode($chars);
// Get a string starting from the any of the characters found
// This "strpbrk" is probably the most cryptic part of this code
$substring = strlen($chars) ? strpbrk($string, $chars) : '';
// Get the first character from the new string
$char = strlen($substring) ? $substring[0] : '';
// PROFIT!
echo $char;

$str="abbcade";
$checked= array(); // we will store all checked characters in this array, so we do not have to check them again
for($i=0; $i<strlen($str); $i++)
{
$c=0;
if(in_array($str[$i],$checked)) continue;
$checked[]=$str[$i];
for($j=$i+1;$j<=strlen($str);$j++)
{
if($str[$i]==$str[$j])
{
$c=1;
break;
}
}
if($c!=1)
{
echo "First non repetive char is:".$str[$i];
break;
}
}

This should replace your code...
$array = str_split($string);
$array = array_count_values($array);
$array = array_filter($array, create_function('$key,$val', 'return($val == 1);'));
$first_non_repeated_letter = key(array_shift($array));
Edit: spoke too soon. Took out 'array_unique', thought it actually dropped duplicate values. But character order should be preserved to be able to find the first character.

Here's a function in Scala that would do it:
def firstUnique(chars:List[Char]):Option[Char] = chars match {
case Nil => None
case head::tail => {
val filtered = tail filter (_!=head)
if (tail.length == filtered.length) Some(head) else firstUnique(filtered)
}
}
scala> firstUnique("abbcabz".toList)
res5: Option[Char] = Some(c)
And here's the equivalent in Haskell:
firstUnique :: [Char] -> Maybe Char
firstUnique [] = Nothing
firstUnique (head:tail) = let filtered = (filter (/= head) tail) in
if (tail == filtered) then (Just head) else (firstUnique filtered)
*Main> firstUnique "abbcabz"
Just 'c'
You can solve this more generally by abstracting over lists of things that can be compared for equality:
firstUnique :: Eq a => [a] -> Maybe a
Strings are just one such list.

Can be also done using array_key_exists during building an associative array from the string. Each character will be a key and will count the number as value.
$sample = "abbcabz";
$check = [];
for($i=0; $i<strlen($sample); $i++)
{
if(!array_key_exists($sample[$i], $check))
{
$check[$sample[$i]] = 1;
}
else
{
$check[$sample[$i]] += 1;
}
}
echo array_search(1, $check);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

CSV Import Split by Comma - what to do about quotes? - php

I have a CSV file I'm importing but am running into an issue. The data is in the format: TEST 690, "This is a test 1, 2 and 3" ,$14.95 ,4 I need to be able to explode by the , that are not within the quotes...

See the fgetcsv function. If you already have a string, you can create a stream that wraps it and then use fgetcsv. See http://code.google.com/p/phpstringstream/source/browse/trunk/stringstream.php

Related

Simulating leftTrim function in PHP

Reverse the position of all letters in each word of a string that may contain multibyte characters

Search through array and removing duplicates

PHP: Check string for certain words

How to find first non-repetitive character from a string?

Categories

Resources