php simplest case regex replacement, but backtraces not working - php

Hacking up what I thought was the second simplest type of regex (extract a matching string from some strings, and use it) in php, but regex grouping seems to be tripping me up.
Objective
take a ls of files, output the commands to format/copy the files to have the correct naming format.
Resize copies of the files to create thumbnails. (not even dealing with that step yet)
Failure
My code fails at the regex step, because although I just want to filter out everything except a single regex group, when I get the results, it's always returning the group that I want -and- the group before it, even though I in no way requested the first backtrace group.
Here is a fully functioning, runnable version of the code on the online ide:
http://ideone.com/2RiqN
And here is the code (with a cut down initial dataset, although I don't expect that to matter at all):
<?php
// Long list of image names.
$file_data = <<<HEREDOC
07184_A.jpg
Adrian-Chelsea-C08752_A.jpg
Air-Adams-Cap-Toe-Oxford-C09167_A.jpg
Air-Adams-Split-Toe-Oxford-C09161_A.jpg
Air-Adams-Venetian-C09165_A.jpg
Air-Aiden-Casual-Camp-Moc-C09347_A.jpg
C05820_A.jpg
C06588_A.jpg
Air-Aiden-Classic-Bit-C09007_A.jpg
Work-Moc-Toe-Boot-C09095_A.jpg
HEREDOC;
if($file_data){
$files = preg_split("/[\s,]+/", $file_data);
// Split up the files based on the newlines.
}
$rename_candidates = array();
$i = 0;
foreach($files as $file){
$string = $file;
$pattern = '#(\w)(\d+)_A\.jpg$#i';
// Use the second regex group for the results.
$replacement = '$2';
// This should return only group 2 (any number of digits), but instead group 1 is somehow always in there.
$new_file_part = preg_replace($pattern, $replacement, $string);
// Example good end result: <img src="images/ch/ch-07184fs.jpg" width="350" border="0">
// Save the rename results for further processing later.
$rename_candidates[$i]=array('file'=>$file, 'new_file'=>$new_file_part);
// Rename the images into a standard format.
echo "cp ".$file." ./ch/ch-".$new_file_part."fs.jpg;";
// Echo out some commands for later.
echo "<br>";
$i++;
if($i>10){break;} // Just deal with the first 10 for now.
}
?>
Intended result for the regex: 788750
Intended result for the code output (multiple lines of): cp air-something-something-C485850_A.jpg ./ch/ch-485850.jpg;
What's wrong with my regex? Suggestions for simpler matching code would be appreciated as well.

Just a guess:
$pattern = '#^.*?(\w)(\d+)_A\.jpg$#i';
This includes the whole filename in the match. Otherwise preg_replace() will really only substitute the end of each string - it only applies the $replacement expression on the part that was actually matched.

Scan Dir and Expode
You know what? A simpler way to do it in php is to use scandir and explode combo
$dir = scandir('/path/to/directory');
foreach($dir as $file)
{
$ext = pathinfo($file,PATHINFO_EXTENSION);
if($ext!='jpg') continue;
$a = explode('-',$file); //grab the end of the string after the -
$newfilename = end($a); //if there is no dash just take the whole string
$newlocation = './ch/ch-'.str_replace(array('C','_A'),'', basename($newfilename,'.jpg')).'fs.jpg';
echo "#copy($file, $newlocation)\n";
}
#and you are done :)
explode: basically a filename like blah-2.jpg is turned into a an array('blah','2.jpg); and then taking the end() of that gets the last element. It's the same almost as array_pop();
Working Example
Here's my ideaone code http://ideone.com/gLSxA

Related

Ajax search POST to php

I need some help with refining my current search.
I have folder with images that are named as:
20171116-category_title.jpg (where first number is date yyyymmdd)
My current search looks like this:
<?php
// string to search in a filename.
if(isset($_POST['question'])){
$searchString = $_POST['question'];
}
// image files in my/dir
$imagesDir = '';
$files = glob($imagesDir . '*.{jpg,jpeg,png,gif}', GLOB_BRACE);
// array populated with files found
// containing the search string.
$filesFound = array();
// iterate through the files and determine
// if the filename contains the search string.
foreach($files as $file) {
$name = pathinfo($file, PATHINFO_FILENAME);
// determines if the search string is in the filename.
if(strpos(strtolower($name), strtolower($searchString))) {
$filesFound[] = $file;
}
}
// output the results.
echo json_encode($filesFound, JSON_UNESCAPED_UNICODE);
?>
And this works just fine but...
I would like to limit search only to part of .jpg name that contains "title" behind underscore " _ " and after that (if possible) to expand search to:
To make double search if AJAX POST sends following format: abc+xyz where delimiter "+" practicaly means 2 queries.
First part is (abc) which targets "category" that stands between minus and underscore and second part of query (xyz) (which is basically my first question) only among previously found (category) answers.
Your tips are more than welcome!
Thank you!
For the first part of your question, the exact pattern you use depends on the format of your category strings. If you will never have underscores _ in the category, here's one solution:
foreach($files as $file) {
// $name = "20171116-category_title"
$name = pathinfo($file, PATHINFO_FILENAME);
// $title = "title", assuming your categories will never have "_".
// The regular expression matches 8 digits, followed by a hyphen,
// followed by anything except an underscore, followed by an
// underscore, followed by anything
$title = preg_filter('/\d{8}-[^_]+_(.+)/', '$1', $name);
// Now search based on your $title, not $name
// *NOTE* this test is not safe, see update below.
if(strpos(strtolower($title), strtolower($searchString))) {
If your categories can or will have underscores, you'll need to adjust the regular expression based on some format you can be sure of.
For your 2nd question, you need to first separate your query into addressable parts. Note though that + is typically how spaces are encoded in URLs, so using it as a delimiter means you will never be able to use search terms with spaces. Maybe that's not a problem for you, but if it is you should try another delimter, or maybe simpler would be to use separate search fields, eg 2 inputs on your search form.
Anyway, using +:
if(isset($_POST['question'])){
// $query will be an array with 0 => category term, and 1 => title term
$query = explode('+', $_POST['question']);
}
Now in your loop you need to identify not just the $title part of the filename, but also the $category:
$category = preg_filter('/\d{8}-([^_]+)_.+/', '$1', $name);
$title = preg_filter('/\d{8}-[^_]+_(.+)/', '$1', $name);
Once you have those, you can use them in your final test for a match:
if( strpos(strtolower($category), strtolower($query[0])) && strpos(strtolower($title), strtolower($query[1])) ) {
UPDATE
I just noticed your match test has a problem. strpos can return 0 if a match is found starting at position 0. 0 is a falsey result which which means your test will fail, even though there's a match. You need to explicitly test on FALSE, as described in the docs:
if( strpos(strtolower($category), strtolower($query[0])) !== FALSE
&& strpos(strtolower($title), strtolower($query[1])) !== FALSE ) {

very large php string magically turns into array

I am getting an "Array to string conversion error on PHP";
I am using the "variable" (that should be a string) as the third parameter to str_replace. So in summary (very simplified version of whats going on):
$str = "very long string";
str_replace("tag", $some_other_array, $str);
$str is throwing the error, and I have been trying to fix it all day, the thing I have tried is:
if(is_array($str)) die("its somehow an array");
serialize($str); //inserted this before str_replace call.
I have spent all day on it, and no its not something stupid like variables around the wrong way - it is something bizarre. I have even dumped it to a file and its a string.
My hypothesis:
The string is too long and php can't deal with it, turns into an array.
The $str value in this case is nested and called recursively, the general flow could be explained like this:
--code
//pass by reference
function the_function ($something, &$OFFENDING_VAR, $something_else) {
while(preg_match($something, $OFFENDING_VAR)) {
$OFFENDING_VAR = str_replace($x, y, $OFFENDING_VAR); // this is the error
}
}
So it may be something strange due to str_replace, but that would mean that at some point str_replace would have to return an array.
Please help me work this out, its very confusing and I have wasted a day on it.
---- ORIGINAL FUNCTION CODE -----
//This function gets called with multiple different "Target Variables" Target is the subject
//line, from and body of the email filled with << tags >> so the str_replace function knows
//where to replace them
function perform_replacements($replacements, &$target, $clean = TRUE,
$start_tag = '<<', $end_tag = '>>', $max_substitutions = 5) {
# Construct separate tag and replacement value arrays for use in the substitution loop.
$tags = array();
$replacement_values = array();
foreach ($replacements as $tag_text => $replacement_value) {
$tags[] = $start_tag . $tag_text . $end_tag;
$replacement_values[] = $replacement_value;
}
# TODO: this badly needs refactoring
# TODO: auto upgrade <<foo>> to <<foo_html>> if foo_html exists and acting on html template
# Construct a regular expression for use in scanning for tags.
$tag_match = '/' . preg_quote($start_tag) . '\w+' . preg_quote($end_tag) . '/';
# Perform the substitution until all valid tags are replaced, or the maximum substitutions
# limit is reached.
$substitution_count = 0;
while (preg_match ($tag_match, $target) && ($substitution_count++ < $max_substitutions)) {
$target = serialize($target);
$temp = str_replace($tags,
$replacement_values,
$target); //This is the line that is failing.
unset($target);
$target = $temp;
}
if ($clean) {
# Clean up any unused search values.
$target = preg_replace($tag_match, '', $target);
}
}
How do you know $str is the problem and not $some_other_array?
From the manual:
If search and replace are arrays, then str_replace() takes a value
from each array and uses them to search and replace on subject. If
replace has fewer values than search, then an empty string is used for
the rest of replacement values. If search is an array and replace is a
string, then this replacement string is used for every value of
search. The converse would not make sense, though.
The second parameter can only be an array if the first one is as well.

Remove equal part of two strings

In PHP, I have two paths on a server that both have a matching part. I'd like to join them, but delete the part that is equal.
EXAMPLE:
Path #1:
/home7/username/public_html/dir/anotherdir/wp-content/uploads
Path #2:
/dir/anotherdir/wp-content/uploads/2011/09/image.jpg
You see the part /dir/anotherdir/wp-content/uploads is the same in both strings, but when I simply join them I would have some directories twice.
The output I need is this:
/home7/username/public_html/dir/anotherdir/wp-content/uploads/2011/09/image.jpg
Since the dirs can change on different servers I need a dynamic solution that detects the matching part from #2 and removes it on #1 so I can trail #2 right after #1 :)
$path1 = "/home7/username/public_html/dir/anotherdir/wp-content/uploads";
$path2 = "/dir/anotherdir/wp-content/uploads/2011/09/image.jpg";
echo $path1 . substr($path2, strpos($path2, basename($path1)) + strlen(basename($path1)));
The problem is not so generic here. You should not look at the problem as matching equal parts of strings, rather you should look at it like equal directory structure.
That said you need to concentrate on strings after '/'.
So basically you need to do string matching of directory names. Moreover your problem looks like that first input file name's last part of directory structure name may be common to some part (starting from first character) of second input string.
So I will suggest to start reading the first input from end at the jumps of '/' and try to get first string matching with the first folder name in second file-path. If match happens then rest of the string character from this index to last index in first file-path should be there in first part of second input string. If this condition fails the repeat the process of finding the first directory name in second string matching with a directory name in first file-name for next index.
This code can help you:
$str1 = $argv[1];
$str2 = $argv[2];
//clean
$str1 = trim(str_replace("//", "/", $str1), "/");
$str2 = trim(str_replace("//", "/", $str2), "/");
$paths1 = explode("/", $str1);
$paths2 = explode("/", $str2);
$j = 0;
$found = false;
$output = '';
for ($i=0; $i<count($paths1); $i++) {
$item1 = $paths1[$i];
$item2 = $paths2[$j];
if ($item1 == $item2) {
if (!$found)
$found = $i; //first point
$j++;
} else if ($found) {
//we have found a subdir so remove
$output = "/".implode("/", array_slice($paths1, 0, $i))
."/".implode("/", array_slice($paths2, $j));
$found = false;
break;
}
}
//final checking
if ($found) {
$output = "/".implode("/", $paths1)
."/".implode("/", array_slice($paths2, $j));
}
print "FOUND?: ".(!empty($output)?$output:'No')."\n";
Will detect the equal substrings and will cut the first string until that point and copy the other part from second string.
This code will accept also two strings if they share "partial" substrings like:
/path1/path2/path3
/path2/other/file.png
will output:
/path/path2/other/file.png
And will remove the "path3", but with few changes can be more strict
how about using the similar_text as described in this link. It returns the matching chars between two strings. Once you have it, replace the first one with empty string and append the second.

PHP replacing entire string if it contains integer

My script lists out files in the directory. I am able to use preg_match and regex to find files whose filenames contain integers.
However, this is what I am unable to do: I want an entire string to be omitted if it contains an integer.
Despite trying several methods, I am only able to replace the integer itself and not the entire line. Any help would be appreciated.
if (preg_match('/\d/', $string))
$string = "";
This will turn a string into an empty one if it has any number in it.
According to your description, this should be sth. like:
$files = array();
$dirname = 'C://Temp';
$dh = opendir($dirname) or die();
while( ($fn=readdir($dh)) !== false )
if( !preg_match('/\d+|^\.\.?$/', $fn) )
$files[] = $fn;
closedir($dh);
var_dump($files);
... which reads all file names and stores them (except these with numbers and ../.) in an array '$files', which itself gets displayed at the end of the snipped above. If that doesn't fit your requirement, you should give a more detailed explanation of what you are trying to do
Regards
rbo

Using regex to fix phone numbers in a CSV with PHP

My new phone does not recognize a phone number unless its area code matches the incoming call. Since I live in Idaho where an area code is not needed for in-state calls, many of my contacts were saved without an area code. Since I have thousands of contacts stored in my phone, it would not be practical to manually update them. I decided to write the following PHP script to handle the problem. It seems to work well, except that I'm finding duplicate area codes at the beginning of random contacts.
<?php
//the script can take a while to complete
set_time_limit(200);
function validate_area_code($number) {
//digits are taken one by one out of $number, and insert in to $numString
$numString = "";
for ($i = 0; $i < strlen($number); $i++) {
$curr = substr($number,$i,1);
//only copy from $number to $numString when the character is numeric
if (is_numeric($curr)) {
$numString = $numString . $curr;
}
}
//add area code "208" to the beginning of any phone number of length 7
if (strlen($numString) == 7) {
return "208" . $numString;
//remove country code (none of the contacts are outside the U.S.)
} else if (strlen($numString) == 11) {
return preg_replace("/^1/","",$numString);
} else {
return $numString;
}
}
//matches any phone number in the csv
$pattern = "/((1? ?\(?[2-9]\d\d\)? *)? ?\d\d\d-?\d\d\d\d)/";
$csv = file_get_contents("contacts2.CSV");
preg_match_all($pattern,$csv,$matches);
foreach ($matches[0] as $key1 => $value) {
/*create a pattern that matches the specific phone number by adding slashes before possible special characters*/
$pattern = preg_replace("/\(|\)|\-/","\\\\$0",$value);
//create the replacement phone number
$replacement = validate_area_code($value);
//add delimeters
$pattern = "/" . $pattern . "/";
$csv = preg_replace($pattern,$replacement,$csv);
}
echo $csv;
?>
Is there a better approach to modifying the CSV? Also, is there a way to minimize the number of passes over the CSV? In the script above, preg_replace is called thousands of times on a very large String.
If I understand you correctly, you just need to prepend the area code to any 7-digit phone number anywhere in this file, right? I have no idea what kind of system you're on, but if you have some decent tools, here are a couple options. And of course, the approaches they take can presumably be implemented in PHP; that's just not one of my languages.
So, how about a sed one-liner? Just look for 7-digit phone numbers, bounded by either beginning of line or comma on the left, and comma or end of line on the right.
sed -r 's/(^|,)([0-9]{3}-[0-9]{4})(,|$)/\1208-\2\3/g' contacts.csv
Or if you want to only apply it to certain fields, perl (or awk) would be easier. Suppose it's the second field:
perl -F, -ane '$"=","; $F[1]=~s/^[0-9]{3}-[0-9]{4}$/208-$&/; print "#F";' contacts.csv
The -F, indicates the field separator, the $" is the output field separator (yes, it gets assigned once per loop, oh well), the arrays are zero-indexed so second field is $F[1], there's a run-of-the-mill substitution, and you print the results.
Ah programs... sometimes a 10-min hack is better.
If it were me... I'd import the CSV into Excel, sort it by something - maybe the length of the phone number or something. Make a new col for the fixed phone number. When you have a group of similarly-fouled numbers, make a formula to fix. Same for the next group. Should be pretty quick, no? Then export to .csv again, omitting the bad col.
A little more digging on my own revealed the issues with the regex in my question. The problem is with duplicate contacts in the csv.
Example:
(208) 555-5555, 555-5555
After the first pass becomes:
2085555555, 208555555
and After the second pass becomes
2082085555555, 2082085555555
I worked around this by changing the replacement regex to:
//add escapes for special characters
$pattern = preg_replace("/\(|\)|\-|\./","\\\\$0",$value);
//add delimiters, and optional area code
$pattern = "/(\(?[0-9]{3}\)?)? ?" . $pattern . "/";

Categories