php "glob" pattern help - php

I have a directory with sitemaps, which all end with a number.
I have an automatic sitemap generator script which needs to count all sitemaps in a folder using glob.
Currently I am stuck here.
What I need is to count all sitemap files which has a number in them, so I don't count the ones without any numbers.
For instance, in my root I have a sitemap.xml file, then I also have sitemap1.xml, sitemap2.xml, sitemap3.xml etc...
I need to use glob to only return true when the filename contains a number like "sitemap1.xml".
Is this possible?
$nr_of_sitemaps = count(glob(Sitemaps which contains numbers in their filenames));
Thanks

To glob for files ending in <digit>.xml you can use a pattern like:
*[0-9].xml
So to count the matches, the PHP might look like:
$count = count(glob('/path/to/files/*[0-9].xml'));
If you want super-fine control (moreso than glob can give) over the matching files, you could use a general pattern then use preg_grep to filter the resulting array to precisely what you want.
$count = count(
preg_grep(
'#(?:^|/)sitemap\d{1,3}\.xml$#',
glob('/path/to/files/sitemap*.xml')
)
);
See also: http://cowburn.info/2010/04/30/glob-patterns/

Related

Recursively mapping file paths in one folder to another folder

Let's say I have a folder (folder_1) with the following structure:
/folder_1
/dir_1
- file_1_1.txt
- file_1_2.txt
/dir_2
- file_2_1.txt
/dir_2_1
- file_2_1_1.txt
- file_1.txt
Now, let's say I have another folder (folder_2) with the following structure:
/folder_2
/dir_1
- file_1_1.txt
- default.txt
/dir_2
- file_2_1.txt
- default.txt
- default.txt
I need to map every file in folder_1 to a file in folder_2 such that:
/folder_1/dir_1/file_1_1.txt maps to /folder_2/dir_1/file_1_1.txt.
/folder_1/dir_1/file_1_1.txt maps to /folder_2/dir_1/default.txt
/folder_1/dir_2/file_2_1.txt maps to /folder_2/dir_2/file_2_1.txt
/folder_1/dir_2/dir_2_1/file_2_1_1.txt maps to /folder_2/dir_2/default.txt
/folder_1/file_1.txt maps to /folder_2/default.txt
I am not the best communicator, so hopefully, the above pattern makes sense to you guys. The question is language agnostic really, but an answer in PHP and/or Javascript would be really great.
So far, I was able to accomplish this in PHP using FileIterator, RecursiveDirectoryIterator, and a bunch of custom classes that extract and then map the path to the files one by one.
This makes me wonder if I am missing an easier way to do this simple mapping. Maybe using regex named groups or something?
**Edit: **
Is it possible that for each file (file path) in folder_1, we use a regex pattern to find (reduce) the best match out of a map of all file paths in folder_2?
Further edit:
This is for mapping data files in folder_1 to template files in folder_2. If for a file in folder_1, an exact matching file path (including filename) in folder_2 is not found, we look for default.txt. If default.txt is not found, then we move up a directory and use that parent directory's default.txt. This way, we keep moving up directory levels till we find the first default.txt.
First, use your recursive directory scanner to scan all of the folder_2 directory tree. Build a hash table that contains the file names, without the folder_2 prefix. So your hash table would contain:
/dir_1
/dir_1/file_1_1.txt
/dir_1/default.txt
/dir_2/file_2_1.txt
/dir_2/default.txt
/default.txt
Now, start scanning folder_1. When you get a file, strip folder_1 from the front, and look for the resulting string in the hash table. If it's there, then you have a match.
If the file is not there, replace the last segment with "default.txt", and try again. So, when you begin scanning folder_1, you get:
/folder_1/dir_1/file_1_1.txt
You look up dir_1/file_1_1.txt in the hash table and find it. You have a match.
Next, you get /folder_1/dir_1/file_1_2.txt. You look up /dir_1/file_1_2.txt in the hash table and don't find it. So you replace file_1_2.txt with default.txt, giving you /dir_1/default.txt. You look that up in the hash table, find it, and you have a match.
Now, if /dir_1/default.txt did not exist, then you would again adjust the file name to remove the last directory. That is, you'd remove /dir_1, and you'd look up /default.txt in the hash table.
In pseudo code it looks like this:
for each file in folder_1
name = strip `/folder_1` from the name
if name in hash table then
match found
continue (next file)
end if
replace file name (everything after the last '/') with "default.txt"
do
if name in hash table then
match found
continue (next file)
end if
remove the last slash, and everything between it and the previous slash.
(so "/dir_1/default.txt" becomes "/default.txt")
while name.length > 0
// if you get here, no match was found
end for

Using regex pattern match in xhprof ignore function

I am trying to profile a codeigniter application with xhprof. I am getting the report like following...
Now I am trying to ignore some function during xhprof report generation. For that what I did is like following....
$ignore = array(
'???_op',
'???_op#1',
'???_op#2',
'???_op#3',
'???_op#4',
'???_op#5'
);
xhprof_enable(XHPROF_FLAGS_NO_BUILTINS | XHPROF_FLAGS_CPU | XHPROF_FLAGS_MEMORY, array('ignored_functions' => $ignore));
Now if I want to ignore all the CI related functions (i.e the functions starting like CI_*) seems like I have to insert them one by one in the array.
Is there any way where I can pattern match with regex and ignore functions according to my requirement?
Unfortunately, PHP's xhprof_enable() does not support regex patterns in the ignored_functions element of the options parameter.
I reckon the simplest way to manually generate the blacklist would be to copy-paste the rendered output from the function into your favorite IDE.
Once the text is in your IDE use the regex find/replace functionality to isolate your desired function names such as:
^(?:\?{3}_op|CI_)\S*
Then just copy the matches into your blacklist array.

Comparing User IP with .txt file content

I have a .txt file which contains about 100.000 IP's (Blacklisted), I want to check if the current user IP is present in that .txt file, if yes script execution should stop.
What would be the most efficient way to do this without using .htaccess.
$file = file_get_contents( "your_text_file.txt" );
if( preg_match( "/$ip/", $file ) ) {
// block
}
If you're going to block using preg_match you may want to add the newline to the search string and escape the period characters because otherwise they will match any single character ( however unlikely, this may block normal users ). Htaccess is much better suited for this or even a database query.
I think the way you are going to store the data will help you to lookup faster. Keeping the data into the sorted format and then try to do the binary search kind of thing help you to search the thing faster. I am just suggesting the theory part :)

Extract keywords from referrer URL

It seems Google's URLs are structured differently these days. So it is harder to extract the referring keyword from them. Here is an example:
http://www.google.co.uk/search?q=jquery+post+output+46&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#pq=jquery+post+output+46&hl=en&cp=30&gs_id=1v&xhr=t&q=jquery+post+output+php+not+running&pf=p&sclient=psy-ab&client=firefox-a&hs=8N5&rls=org.mozilla:en-US%3Aofficial&source=hp&pbx=1&oq=jquery+post+output+php+not+run&aq=0w&aqi=q-w1&aql=&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=bdeb326aa44b07c5&biw=1280&bih=875
The search I performed was actually "jquery post output php not running", so the first 'q=' does not contain the full search. The second one does. I'd like to write a script that always extracts the last 'q=', but I'm not sure if Google's URL's always have the full search last. Anyone had any experience with this.
You can accomplish this using parse_url(), parse_str(), and urldecode(), where $str is the refer string:
$fragment = parse_url($str, PHP_URL_FRAGMENT);
parse_str($fragment, $arr);
$query = urldecode($arr['q']); // jquery post output php not running

How to add wildcard names to directory search in php

I've got a small php script that will gather all files in a directory. Futhermore, I'm cleaning through this array of names to skip over the ones I don't want:
$dirname = "./_images/border/";
$border_images = scandir($dirname);
$ignore = Array(".", "..");
foreach($border_images as $border){
if(!in_array($border, $ignore)) echo "TEST".$border;
}
This directory would contain images that I want to find. Amongst these images, there will be a thumbnail version and a full-size version of each image. I'm planning to have each image either labeled *-thumbnail or *-full to more easily sort through.
What I'm trying to find is a way to, preferably with the $ignore array, add a wildcard string that will be recognized by a check condition. For example, adding *-full in my $ignore array would make that files with this tag, anywhere in their filenames, would be ignored. I'm pretty sure the in_array wouldn't accept this. If this isn't possible, would using regular expressions be possible? If so, what would my expression be?
Thanks.
You're probably looking for php's function glob()
$files_full = glob('*-full.*');
There is a better way to do this known as glob().
Take a look at glob function.
glob — Find pathnames matching a pattern

Categories