I need some help with refining my current search.
I have folder with images that are named as:
20171116-category_title.jpg (where first number is date yyyymmdd)
My current search looks like this:
<?php
// string to search in a filename.
if(isset($_POST['question'])){
$searchString = $_POST['question'];
}
// image files in my/dir
$imagesDir = '';
$files = glob($imagesDir . '*.{jpg,jpeg,png,gif}', GLOB_BRACE);
// array populated with files found
// containing the search string.
$filesFound = array();
// iterate through the files and determine
// if the filename contains the search string.
foreach($files as $file) {
$name = pathinfo($file, PATHINFO_FILENAME);
// determines if the search string is in the filename.
if(strpos(strtolower($name), strtolower($searchString))) {
$filesFound[] = $file;
}
}
// output the results.
echo json_encode($filesFound, JSON_UNESCAPED_UNICODE);
?>
And this works just fine but...
I would like to limit search only to part of .jpg name that contains "title" behind underscore " _ " and after that (if possible) to expand search to:
To make double search if AJAX POST sends following format: abc+xyz where delimiter "+" practicaly means 2 queries.
First part is (abc) which targets "category" that stands between minus and underscore and second part of query (xyz) (which is basically my first question) only among previously found (category) answers.
Your tips are more than welcome!
Thank you!
For the first part of your question, the exact pattern you use depends on the format of your category strings. If you will never have underscores _ in the category, here's one solution:
foreach($files as $file) {
// $name = "20171116-category_title"
$name = pathinfo($file, PATHINFO_FILENAME);
// $title = "title", assuming your categories will never have "_".
// The regular expression matches 8 digits, followed by a hyphen,
// followed by anything except an underscore, followed by an
// underscore, followed by anything
$title = preg_filter('/\d{8}-[^_]+_(.+)/', '$1', $name);
// Now search based on your $title, not $name
// *NOTE* this test is not safe, see update below.
if(strpos(strtolower($title), strtolower($searchString))) {
If your categories can or will have underscores, you'll need to adjust the regular expression based on some format you can be sure of.
For your 2nd question, you need to first separate your query into addressable parts. Note though that + is typically how spaces are encoded in URLs, so using it as a delimiter means you will never be able to use search terms with spaces. Maybe that's not a problem for you, but if it is you should try another delimter, or maybe simpler would be to use separate search fields, eg 2 inputs on your search form.
Anyway, using +:
if(isset($_POST['question'])){
// $query will be an array with 0 => category term, and 1 => title term
$query = explode('+', $_POST['question']);
}
Now in your loop you need to identify not just the $title part of the filename, but also the $category:
$category = preg_filter('/\d{8}-([^_]+)_.+/', '$1', $name);
$title = preg_filter('/\d{8}-[^_]+_(.+)/', '$1', $name);
Once you have those, you can use them in your final test for a match:
if( strpos(strtolower($category), strtolower($query[0])) && strpos(strtolower($title), strtolower($query[1])) ) {
UPDATE
I just noticed your match test has a problem. strpos can return 0 if a match is found starting at position 0. 0 is a falsey result which which means your test will fail, even though there's a match. You need to explicitly test on FALSE, as described in the docs:
if( strpos(strtolower($category), strtolower($query[0])) !== FALSE
&& strpos(strtolower($title), strtolower($query[1])) !== FALSE ) {
Related
I have an S3 bucket full of images whose naming follows a simple pattern. The first 6 digits group images by listing number, the trailing digit(s) are non-sequential, but follow a reliable pattern (0 thru 99) I'm capturing the six digits that start the filename in a variable $ln.
/*
https://s3.amazonaws.com/stroupenwmls2/602665_10.jpg
https://s3.amazonaws.com/stroupenwmls2/602665_12.jpg
https://s3.amazonaws.com/stroupenwmls2/602665_13.jpg
https://s3.amazonaws.com/stroupenwmls2/602665_15.jpg
*/
What I want to do is populate a 'listing' img src attribute with the url to an image, if one exists for that listing (if not, I provide a no-image.jpg). And I'm looping thru many different listings to create my web page.
I'm struggling with the logic to grab the first image that matches the $listing variable. Here is what I've tried, with no luck (just produces a 0):
$bucket = 'https://s3.amazonaws.com/stroupenwmls2/';
$ln = '602665';
$string = $bucket . $ln . '_';
// match the pattern '_xx.jpg', with 1 or 2 numbers
$image = preg_match('/^_[0-9]{1,2}\.(jpg|jpeg|png|gif)/i', $string);
Then in my web app:
<img src="<?php echo $image ?>">
I'm an idiot when it comes to using preg_match, what I really need is some sort of wildcard parameter. I'm sure I'm making this way too complicated.
The problem is that you're not matching against the image paths, you're matching against what i assume you intend to be part of your regular expression. See below:
$bucket = 'https://s3.amazonaws.com/stroupenwmls2/';
$ln = '602665';
$re = $bucket . $ln . '_' + '[0-9]{1,2}\.(jpg|jpeg|png|gif)';
// let's say you have an array called img_list;
// loop through each path in the list, searching strings
// that match the regular expression constructed in $re.
// if you find a match, return it.
// you'd probably want to define a function to do this for you,
// and call it with the $listing and array as parameters.
foreach (img_list as $img) {
// this returns either 0 or 1 depending on match.
// return the first one, and we're done.
if (preg_match('/^' . $re . '/i', $img)) {
return $img;
}
}
i have the fowling code in my project:
$title = "In this title we have the word GUN"
$needed_words = array('War', 'Gun', 'Shooting');
foreach($needed_words as $needed_word) {
if (preg_match("/\b$needed_word\b/", $title)) {
$the_word = "ECHO THE WORD THATS FIND INSIDE TITLE";
}
}
I want to check if $title contains one of 15 predefined words,
for example lets say:
if $title contains words "War, Gun, Shooting" then i want to assign the word that is find to $the_word
Thanks in advance for your time!
try this
$makearray=array('war','gun','shooting');
$title='gun';
if(in_array($title,$makearray))
{
$if_included='the value you want to give';
echo $if_included;
}
Note:- This will work if your $title contains exactly the same string that is present as one of the value in the array.Otherwise not.
The best approach would be to use regular expressions, as it is most flexible, and allows you to have more controll over the words which you like to match. To be sure that the string contains words like gun (but also guns), shoot (but also shooting) you can do the following:
$words = array(
'war',
'gun',
'shoot'
);
$pattern = '/(' . implode(')|(', $words) . ')/i';
$if_included = (bool) preg_match($pattern, "Some text - here");
var_dump($if_included);
This matches more then it should. For example it will return true also if the string contains a warning (becouse it starts with war) you can improve this by introducing additinal constraints to certain patterns. For example:
$words = array(
'war(?![a-z])', // now it will match "war", but not "warning"
'gun',
'shoot'
);
Here's the deal, I am handling a OCR text document and grabbing UPC information from it with RegEx. That part I've figured out. Then I query a database and if I don't have record of that UPC I need to go back to the text document and get the description of the product.
The format on the receipt is:
NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456
So, when I go back the second time to find the name of the item I am at a complete loss. I know how to get to the line where the UPC is, but how can I use something like regex to get the name that precedes the UPC? Or some other method. I was thinking of somehow storing the entire line and then parsing it with PHP, but not sure how to get the line either.
Using PHP.
Get all of the names of the items indexed by their UPCs with a regex and preg_match_all():
$str = 'NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456';
preg_match_all( '/^(.*?)\s+(\d+)/m', $str, $matches);
$items = array();
foreach( $matches[2] as $k => $upc) {
if( !isset( $items[$upc])) {
$items[$upc] = array( 'name' => $matches[1][$k], 'count' => 0);
}
$items[$upc]['count']++;
}
This forms $items so it looks like:
Array (
[123456789012] => NAME OF ITEM
[987654321098] => OTHER NAME
[567890123456] => NAME
)
Now, you can lookup any item name you want in O(1) time, as seen in this demo:
echo $items['987654321098']; // OTHER NAME
You can find the string preceding a value you know with the following regex:
$receipt = "NAME OF ITEM 123456789012\n" .
"OTHER NAME 987654321098\n" .
"NAME 567890123456";
$upc = '987654321098';
if (preg_match("/^(.*?) *{$upc}/m", $receipt, $matches)) {
$name = $matches[1];
var_dump($name);
}
The /m flag on the regex makes the ^ work properly with multi-line input.
The ? in (.*?) makes that part non-greedy, so it doesn't grab all the spaces
It would be simpler if you grabbed both the name and the number at the same time during the initial pass. Then, when you check the database to see if the number is present, you already have the name if you need to use it. Consider:
preg_match_all('^([A-Za-z ]+) (\d+)$', $document, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$name = $match[1];
$number = $match[2];
if (!order_number_in_database($number)) {
save_new_order($number, $name);
}
}
You can use lookahead assertions to match string preceding the UPC.
http://php.net/manual/en/regexp.reference.assertions.php
By something like this: ^\S*(?=\s*123456789012) substituting the UPC with the UPC of the item you want to find.
I'm lazy, so I would just use one regex that gets both parts in one shot using matching groups. Then, I would call it every time and put each capture group into name and upc variables. For cases in which you need the name, just reference it.
Use this type of regex:
/([a-zA-Z ]+)\s*(\d*)/
Then you will have the name in the $1 matching group and the UPC the $2 matching group. Sorry, it's been a while since I've used php, so I can't give you an exact code snippet.
Note: the suggested regex assumes you'll only have letters or spaces in your "names" if that's not the case, you'll have to expand the character class.
Hacking up what I thought was the second simplest type of regex (extract a matching string from some strings, and use it) in php, but regex grouping seems to be tripping me up.
Objective
take a ls of files, output the commands to format/copy the files to have the correct naming format.
Resize copies of the files to create thumbnails. (not even dealing with that step yet)
Failure
My code fails at the regex step, because although I just want to filter out everything except a single regex group, when I get the results, it's always returning the group that I want -and- the group before it, even though I in no way requested the first backtrace group.
Here is a fully functioning, runnable version of the code on the online ide:
http://ideone.com/2RiqN
And here is the code (with a cut down initial dataset, although I don't expect that to matter at all):
<?php
// Long list of image names.
$file_data = <<<HEREDOC
07184_A.jpg
Adrian-Chelsea-C08752_A.jpg
Air-Adams-Cap-Toe-Oxford-C09167_A.jpg
Air-Adams-Split-Toe-Oxford-C09161_A.jpg
Air-Adams-Venetian-C09165_A.jpg
Air-Aiden-Casual-Camp-Moc-C09347_A.jpg
C05820_A.jpg
C06588_A.jpg
Air-Aiden-Classic-Bit-C09007_A.jpg
Work-Moc-Toe-Boot-C09095_A.jpg
HEREDOC;
if($file_data){
$files = preg_split("/[\s,]+/", $file_data);
// Split up the files based on the newlines.
}
$rename_candidates = array();
$i = 0;
foreach($files as $file){
$string = $file;
$pattern = '#(\w)(\d+)_A\.jpg$#i';
// Use the second regex group for the results.
$replacement = '$2';
// This should return only group 2 (any number of digits), but instead group 1 is somehow always in there.
$new_file_part = preg_replace($pattern, $replacement, $string);
// Example good end result: <img src="images/ch/ch-07184fs.jpg" width="350" border="0">
// Save the rename results for further processing later.
$rename_candidates[$i]=array('file'=>$file, 'new_file'=>$new_file_part);
// Rename the images into a standard format.
echo "cp ".$file." ./ch/ch-".$new_file_part."fs.jpg;";
// Echo out some commands for later.
echo "<br>";
$i++;
if($i>10){break;} // Just deal with the first 10 for now.
}
?>
Intended result for the regex: 788750
Intended result for the code output (multiple lines of): cp air-something-something-C485850_A.jpg ./ch/ch-485850.jpg;
What's wrong with my regex? Suggestions for simpler matching code would be appreciated as well.
Just a guess:
$pattern = '#^.*?(\w)(\d+)_A\.jpg$#i';
This includes the whole filename in the match. Otherwise preg_replace() will really only substitute the end of each string - it only applies the $replacement expression on the part that was actually matched.
Scan Dir and Expode
You know what? A simpler way to do it in php is to use scandir and explode combo
$dir = scandir('/path/to/directory');
foreach($dir as $file)
{
$ext = pathinfo($file,PATHINFO_EXTENSION);
if($ext!='jpg') continue;
$a = explode('-',$file); //grab the end of the string after the -
$newfilename = end($a); //if there is no dash just take the whole string
$newlocation = './ch/ch-'.str_replace(array('C','_A'),'', basename($newfilename,'.jpg')).'fs.jpg';
echo "#copy($file, $newlocation)\n";
}
#and you are done :)
explode: basically a filename like blah-2.jpg is turned into a an array('blah','2.jpg); and then taking the end() of that gets the last element. It's the same almost as array_pop();
Working Example
Here's my ideaone code http://ideone.com/gLSxA
My script lists out files in the directory. I am able to use preg_match and regex to find files whose filenames contain integers.
However, this is what I am unable to do: I want an entire string to be omitted if it contains an integer.
Despite trying several methods, I am only able to replace the integer itself and not the entire line. Any help would be appreciated.
if (preg_match('/\d/', $string))
$string = "";
This will turn a string into an empty one if it has any number in it.
According to your description, this should be sth. like:
$files = array();
$dirname = 'C://Temp';
$dh = opendir($dirname) or die();
while( ($fn=readdir($dh)) !== false )
if( !preg_match('/\d+|^\.\.?$/', $fn) )
$files[] = $fn;
closedir($dh);
var_dump($files);
... which reads all file names and stores them (except these with numbers and ../.) in an array '$files', which itself gets displayed at the end of the snipped above. If that doesn't fit your requirement, you should give a more detailed explanation of what you are trying to do
Regards
rbo