I'm creating a word-replacement script. I've run into a roadblock with ignoring strings between quotes and haven't been able to find a decent solution here that didn't involve Regex.
I have a working snippet that cycles through every character in the string and figures out whether the most recent quotation was an opening or closing quote (Whether single or double) and ignores escaped quotes. The problem is that in order for it to provide a 100% accurate experience, it has to run every time the string changes (Because of how it works, it could change well over 60K times across a single function), and due to string length potential, the code takes too long even on a fairly short script.
Is there a fast way to figure out whether a string is between open and close quotes (Single and double)? Ignoring escaped " and '. Or, do you have suggestions on how to optimize the snippet to make it run significantly faster? Removing this function, the process runs at almost the preferred speed (Instant).
As an exercise, consider copying and pasting the snippet into the script with a variable containing text. For example $thisIsAQuote = "This is a quote."; And, from that point, everything should replace correctly, except $thisIsAQuote should retain its exact text.
But here's the issue: Other solutions I've found will treat everything between "This is a quote." and ... $this->formatted[$i - 1] != " ... as if it's still between quotes. Because as far as those solutions are concerned, the last quote in "This is a quote." and the first quote in the if-check are open and close quotes. Another obvious issue is that some strings contain words with apostrophes. Apostrophes shouldn't be treated as single-quotes, but in all solutions I've found, they are.
In other words, they're "unaware" solutions.
$quoteClosed = true;
$singleQuoteClosed = true;
$codeLength = mb_strlen($this->formatted);
if ($codeLength == false)
return;
for ($i = 0; $i < $codeLength; $i++)
{
if ((!$quoteClosed || !$singleQuoteClosed) && ($this->formatted[$i] == '"' || $this->formatted[$i] == "'"))
{
if (!$quoteClosed && $this->formatted[$i - 1] != "\\")
$quoteClosed = true;
else if (!$singleQuoteClosed && $this->formatted[$i - 1] != "\\")
$singleQuoteClosed = true;
}
else if ($this->formatted[$i] == '"' && ($i <= 0 || $this->formatted[$i - 1] != "\\"))
{
if ($quoteClosed && $singleQuoteClosed)
$quoteClosed = false;
}
else if ($this->formatted[$i] == "'" && ($i <= 0 || $this->formatted[$i - 1] != "\\"))
{
if ($singleQuoteClosed && $quoteClosed)
$singleQuoteClosed = false;
}
if ($quoteClosed && $singleQuoteClosed)
$this->quoted[$i] = 0;
else
$this->quoted[$i] = 1;
}
If there isn't a way to make the above more efficient, is there a non-Regex way to quickly replace all substrings in an array with substrings in a second array without missing any across an entire string?
substr_replace and str_replace only seem to replace "some" pieces of the overall string, which is why the number of iterations are in place. It cycles through a while loop until either strpos deems a string nonexistent (Which it never seems to do ... I may be using it wrong), or it cycles through 10K times, whichever occurs first.
Running the above snippet -once- per round would solve the speed issue, but that leaves the "full-replacement" issue and, of course, staying aware that it should avoid replacing anything within quotes.
for ($a = 0; $a < count($this->keys); $a++)
{
$escape = 0;
if ($a > count($this->keys) - 5)
$this->formatted = $this->decodeHTML($this->formatted);
while (strpos($this->formatted, $this->keys[$a]) !== false)
{
$valid = strpos($this->formatted, $this->keys[$a]);
if ($valid === false || $this->quoted[$valid] === 1)
break;
$this->formatted = substr_replace($this->formatted, $this->answers[$a], $valid, mb_strlen($this->keys[$a]));
$this->initializeQuoted();
$escape++;
if ($escape >= 10000)
break;
}
if ($a > count($this->keys) - 5)
$this->formatted = html_entity_decode($this->formatted);
}
$this->quoted = array();
$this->initializeQuoted();
return $this->formatted;
'keys' and 'answers' are arrays containing words of various lengths. 'formatted' is the new string with the changed information. 'initializeQuoted' is the above snippet. I use htmlentities and html_entity_decode to help get rid of whitespaces with key/answer replacements.
Ignore the magic numbers (5s and 10K).
If I understand you correctly then you can do this:
$replacements = [
"test" => "banana",
"Test" => "Banana"
];
$brackets = [[0]];
$lastOpenedQuote = null;
for ($i = 0;$i < strlen($string);$i++) {
if ($string[$i] == "\\") { $i++; continue; } //Skip escaped chars
if ($string[$i] == $lastOpenedQuote) {
$lastOpenedQuote = null;
$brackets[count($brackets)-1][] = $i;
$brackets[] = [ $i+1 ];
} elseif ($lastOpenedQuote == null && ($string[$i] == "\"" || $string[$i] == "'")) {
$lastOpenedQuote = $string[$i];
$brackets[count($brackets)-1][] = $i-1;
$brackets[] = [ $i ];
}
}
$brackets[count($brackets)-1][] = strlen($string)-1;
$prev = 0;
$bits = [];
foreach ($brackets as $index => $pair) {
$bits[$index] = substr($string,$pair[0],$pair[1]-$pair[0]+1);
if ($bits[$index][0] != "\"" && $bits[$index][0] != "'") {
$bits[$index] = str_replace(array_keys($replacements),array_values($replacements), $bits[$index]);
}
}
Check it out at: http://sandbox.onlinephpfunctions.com/code/0453cb7941f1dcad636043fceff30dc0965541ee
Now if performance is still an issue keep in mind this goes through each string character 1 time and does the minimum number of checks it needs each time so it will be really hard to reduce it more. Perhaps you should revise your approach from the bottom up if you need something faster like e.g. doing some of the splitting on the client-side progressively instead of on the whole string on the serverside.
I was just working on this. Hope this gives you some additional ideas.
MATCH: ["]([\w\s\(\)\.\d\_\-\[\]\{\}]+|\s*)["]
REPLACE: ""
<?xml version="1.0" encoding="UTF-8"?>
<NotepadPlus>
<ScintillaContextMenu>
<!--
NOTES: BLAH
-->
[WEBSITE]
https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/PowerEditor/installer/nativeLang/english.xml
-->
<Item MenuId="Tools" MenuItemName="Generate..."/>
<Item MenuEntryName="Edit" FolderName="Remove Lines" MenuItemName="Remove Empty Lines" ItemNameAs="Empty Lines"/>
<Item MenuEntryName="Plugins" FolderName="Remove Lines" MenuItemName="Remove duplicate lines" ItemNameAs="Duplicate Lines (Plugin)"/>
<Item MenuEntryName="Edit" FolderName="Remove Lines" MenuItemName="Remove Consecutive Duplicate Lines" ItemNameAs="Duplicate Lines"/>
<Item MenuEntryName="Search" FolderName="Add Style Tokens" MenuItemName="Using 1st Style" ItemNameAs="1"/>
<Item id="45003" Foldername="Convert" ItemNameAs="Macintosh (CR)"/>
<Item id="0" FolderName="XML Tools"/>
<Item MenuEntryName="Plugins" FolderName="XML Tools" MenuItemName="Options..." ItemNameAs="Options"/>
</ScintillaContextMenu>
</NotepadPlus>
Let me know if you come up with anything else.
I seem to be having a problem with my fgetcsv command not comma delimiting, basically what is happening is my data is just one really long string in an array. I need the data to be in an array format for the implode function to work properly for uploading into a mysql database.
Currently the mysql upload looks like...
INSERT INTO metars VALUES('PAJZ 011132Z AUTO 1 3/4SM BR FEW009 BKN019 OVC026 08/07 A2959 RMK AO2 PWINO TSNO P0001,PAJZ,2013-07-01T11:32:00Z,59.73,-157.27,8.0,7.0,,,,1.75,29.589567,,,TRUE,TRUE,,,TRUE,,TRUE,BR,FEW,900,BKN,1900,OVC,2600,,,IFR,,,,,,0.01,,,,,,SPECI,82.0"
This should have single quotes around each element, however since its a string the implode isn't working =\
<?php
require_once("../config/dbmetar.php");
$file = "metars.csv";
$db = new mysqli(DB_HOST_METAR, DB_USER_METAR, DB_PASS_METAR, DB_NAME_METAR);
$r = 0;
if (($handle = fopen($file, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 3000, ",", '"')) !== FALSE) {
if ($r >= 6) { //skips the header
foreach($data as $i => $content) {
$data[$i] = $db->real_escape_string($content);
}
//echo "INSERT INTO metars VALUES('" . implode("','", $data) . '"' ;
//echo var_dump($data);
$db->query("INSERT INTO metars VALUES('" . implode("','", $data) . "');");
}
$r++;
}
fclose($handle);
}
?>
Here's an example of the csv file
No errors
Max results hit (1000): not all possible results returned;
59 ms
data source=metars
948 results
raw_text,station_id,observation_time,latitude,longitude,temp_c,dewpoint_c,wind_dir_degrees,wind_speed_kt,wind_gust_kt,visibility_statute_mi,altim_in_hg,sea_level_pressure_mb,corrected,auto,auto_station,maintenance_indicator_on,no_signal,lightning_sensor_off,freezing_rain_sensor_off,present_weather_sensor_off,wx_string,sky_cover,cloud_base_ft_agl,sky_cover,cloud_base_ft_agl,sky_cover,cloud_base_ft_agl,sky_cover,cloud_base_ft_agl,flight_category,three_hr_pressure_tendency_mb,maxT_c,minT_c,maxT24hr_c,minT24hr_c,precip_in,pcp3hr_in,pcp6hr_in,pcp24hr_in,snow_in,vert_vis_ft,metar_type,elevation_m"
"PAJZ 011132Z AUTO 1 3/4SM BR FEW009 BKN019 OVC026 08/07 A2959 RMK AO2 PWINO TSNO P0001,PAJZ,2013-07-01T11:32:00Z,59.73,-157.27,8.0,7.0,,,,1.75,29.589567,,,TRUE,TRUE,,,TRUE,,TRUE,BR,FEW,900,BKN,1900,OVC,2600,,,IFR,,,,,,0.01,,,,,,SPECI,82.0"
"CYOD 011131Z 18002KT 1/2SM FG FEW250 RMK CI0,CYOD,2013-07-01T11:31:00Z,54.4,-110.28,,,180,2,,0.5,,,,,,,,,,,FG,FEW,25000,,,,,,,LIFR,,,,,,,,,,,,SPECI,544.0"
"CYYD 011131Z AUTO VRB06KT 5SM -RA BR FEW007 BKN070 OVC084 15/14 A3005 RMK PRESRR PCPN 1.0MM PAST HR SLP173 DENSITY ALT 1900FT,CYYD,2013-07-01T11:31:00Z,54.82,-127.18,15.0,14.0,0,6,,5.0,30.050198,1017.3,,TRUE,,,,,,,-RA BR,FEW,700,BKN,7000,OVC,8400,,,MVFR,,,,,,,,,,,,SPECI,523.0"
"KNSE 011131Z 00000KT 10SM -RA FEW025 BKN070 BKN120 BKN250 23/19 A2980 RMK AO2 WSHFT 1045 RAB29 P0000 $ ,KNSE,2013-07-01T11:31:00Z,30.72,-87.02,23.0,19.0,0,0,,10.0,29.799213,,,,TRUE,TRUE,,,,,-RA,FEW,2500,BKN,7000,BKN,12000,BKN,25000,VFR,,,,,,0.0050,,,,,,SPECI,61.0"
I've been trying to solve this dilemma for about 6 hours with no luck, i've tried different csv pull methods such as str_getcsv, fgetcsv, i've tried both foreach and for... the simplified version is what I have below and is the easiest way for updating header information.
Purpose: this php script will be ran about every 60sec-150sec in a cron scheduler, if you have any other suggestions regarding this, I would appreciate that information as well.
I do appreciate any help in this matter,
Thanks,
-Mikael
You might want to consider using file() for this type of operation so you can trim the first few lines out and trim the quotes off the ends of the CSV string. The fgetcsv parameters you passed are for each column to be wrapped in quotes, not the entire string.
From there, you can then use str_getcsv()
Example (untested):
<?php
$file = file("metars.csv");
for($i = 0; $i < 6; $i++) array_shift($file);
foreach($file as $line) {
$line = ltrim($line, '"');
$line = rtrim($line, '"');
$csv = str_getcsv($line);
/** do query, preferably a prepared statement **/
}
?>
Of course, if your file is massive - this may not be the best solution.