Generate database table from text file with php - php

I host a bunch of Wordpress sites, and I want to make a script that will check their version every day and if they don't match with Wordpress' latest version, it would notify me.
Right now the easiest way I found to check their versions is looking into the wp-includes/version.php file, so I made this:
grep "$wp_version =" /home/*/public_html/wp-includes/version.php > /~/wordpress/versions.txt
The line above will be put on the cron to be executed once a day.
Now I want to make a php script that will organize each line of the above script into a table, like this:
Folder (in which the wordpress site is installed) ; Version
Then, I want the script to check if the above "Version" matches with the variable $wp_latest_version that I will set. If they match, the "Version" cell will turn green and if they don't, the "Version" will turn red so me or and peers can know which sites we need to update.
I have sort of an idea of how I will do the last part, but right now I'm stuck in the "translate .txt file to table" process.
Any suggestions?
Also, should I use MySQL or not?
This is part of the output text that I want to translate into a table:
/home/alton/public_html/wp-includes/version.php:$wp_version = '3.5.2';
/home/boutaces/public_html/wp-includes/version.php:$wp_version = '3.4';
/home/byseacom/public_html/wp-includes/version.php:$wp_version = '3.5.2';
/home/capricho/public_html/wp-includes/version.php:$wp_version = '3.5.1';
/home/carlosva/public_html/wp-includes/version.php:$wp_version = '3.6.1';
/home/cerimoni/public_html/wp-includes/version.php:$wp_version = '3.5.1';
/home/cotaksis/public_html/wp-includes/version.php:$wp_version = '3.6.1';
/home/crisblog/public_html/wp-includes/version.php:$wp_version = '3.5';
/home/customup/public_html/wp-includes/version.php:$wp_version = '3.5.2';
/home/dentist/public_html/wp-includes/version.php:$wp_version = '3.4.2';

foreach($lines as $line)
{
if(preg_match('|\/home\/(.+?)\/public_html\/wp-includes\/version.php:\$wp_version = '(.+?)';|', $line, $matches))
{
$folder = $matches[1];
$version = $matches[2];
}
}
Once you have the values separated from the string, write them either into a db of your choice or into a file.

Related

file_exists with partial name... is it possible?

I need to check if a file exists in two domains.
However, the format of the file that is written to the db doesnt match the one recorded on my directories, due to a few seconds delay. (examples below)
Filename that actually exists
https://www.dominio01.com.br/sistema/modulos/consulta/consultas/consulta_87314134987_02102017135619.pdf
Result of my filename
https://www.dominio01.com.br/sistema/modulos/consulta/consultas/consulta_87314134987_02102017135613.pdf
As you can see, there is a difference in the last two chars (representing the seconds) before the file extension.
$dir01 = "https://dominio01.com.br/sistema/modulos/consulta/consultas/";
$dir02 = "https://dominio02.com.br/sistema/modulos/consulta/consultas/";
$documento = preg_replace("/[^0-9]/", "", $item['retCNPJCPF']);
$dataDoc = new DateTime($item['retDataHora']);
$filename = "consulta_".$documento."_".$dataDoc->format('dmYHis').".pdf";
if(file_exists($dir01.$filename)){
$lnkConsultas = "Available at dominio 01";
}
elseif(file_exists($dir02.$filename)){
$lnkConsultas = "Available at domínio 02";
}
I would like to know if its possible to bring the files without inform the seconds, and return the occurrences of this. Maybe changing the filename with some regex, but i have no idea how to do that.
PS: I cant use "glob" functions. It will return blank results because the files are in another domains.
file_exists can not do any kind of "searching". I would create simple API scripts on the 2 domains using "glob" to return found files as json based on your search pattern. You could actually invoke them straight from the front-end returning jsonp.

Filtering filenames in PHP

I'm trying to group a bunch of files together based on RecipeID and StepID. Instead of storing all of the filenames in a table I've decided to just use glob to get the images for the requested recipe. I feel like this will be more efficient and less data handling. Keeping in mind the directory will eventually contain many thousands of images. If I'm wrong about this then the below question is not necessary lol
So let's say I have RecipeID #5 (nachos, mmmm) and it has 3 preparation steps. The naming convention I've decided on would be as such:
5_1_getchips.jpg
5_2_laycheese.jpg
5_2_laytomatos.jpg
5_2_laysalsa.jpg
5_3_bake.jpg
5_finishednachos.jpg
5_morefinishedproduct.jpg
The files may be generated by a camera, so DSC###.jpg...or the person may have actually named each picture as I have above. Multiple images can exist per step. I'm not sure how I'll handle dupe filenames, but I feel that's out of scope.
I want to get all of the "5_" images...but filter them by all the ones that DON'T have any step # (grouped in one DIV), and then get all the ones that DO have a step (grouped in their respective DIVs).
I'm thinking of something like
foreach ( glob( $IMAGES_RECIPE . $RecipeID . "-*.*") as $image)
and then using a substr to filter out the step# but I'm concerned about getting the logic right because what if the original filename already has _#_ in it for some reason. Maybe I need to have a strict naming convention that always includes _0_ if it doesn't belong to a step.
Thoughts?
Globbing through 1000s of files will never being faster than having indexed those files in a database (of whatever type) and execute a database query for them. That's what databases are meant for.
I had a similar issue with 15,000 mp3 songs.
In the Win command line dir
dir *.mp3 /b /s > mp3.bat
Used a regex search and replace in NotePad++ that converted the the file names and prefixed and appended text creating a Rename statement and Ran the mp3.bat.
Something like this might work for you in PHP:
Use regex to extract the digits using preg_replace to
Create a logic table(s) to create the words for the new file names
create the new filename with rename()
Here is some simplified and UNTESTED Example code to show what I am suggesting.
Example Logic Table:
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$folder = '/home/user/public_html/recipies/';
$dir=opendir($folder);
while (false !== ($found=readdir($dir))){
if pathinfo($file,PATHINFO_EXTENSION) == '.jpg')
{
$files[]= pathinfo($file,PATHINFO_FILENAME);
}
}
foreach($files as $key=> $filename){
$digit1 = 'DSC(\d)\d\d\.jpg/',"$1", $filename);
$digit2 = 'DSC\d(\d)\d\.jpg',"$1", $filename);
$digit3 = 'DSC\d\d(\d)\.jpg',"$1", $filename);
$newName = $translation[$digit1][$digit2][$digit3]
ren($filename,$newfilename);
}

Scrape only x amount of characters - how?

BACKGROUND
I own a website that indexes all psychologists of Denmark.
My site provides contact information for all the clinics as well as user ratings.
I'm currently listing 12.000 Psychologists, of which about 6.000 have a website. About 1000 of the Psychologists have visited my website, and filled out their profile with additional "Descriptive" info (such as opening hours, prices, etc.)
I'm attempting to automatically scrape (with PHP and RegEx) the sites of those who haven't provided details to my community, for informative reasons.
I went through about a good random 150 of the websites, and concluded that more than 85 % af them, have valuable text proceeding the word 'Velkommen' (=welcome, in Denish). PRECIOUS!
THE QUESTIONS
#1
How do I specificy in my script, that I'd only like to grab approx. 360 characters, and nothing more. Ofc. this should be preceeding (and including) the word Velkommen. Also, the script shouldn't be case sensitive (though Velkommen is usually spelled with a capital V, it can pop up in another sentence.)
Also, it should the last occuring 'velkommen' on the whole frontpage, since it sometimes occurs as a Menu/Navigation option, which would suck, since i'd then grab the navigation options.
#2
Currently - my script saves info in arrays, and then in the database.
Not sure how I should even go about this. What would be optimal for SEO;
Save the scraped text in a MySQL and display that every time.
Render the same 360-characters-text every time [that follows 'Velkommen']
Render random 360-characters-text from the sites, each time someone views a specific Psychologist on my site.
An example site:
$web = "http://www.psykologdorthelau.dk/";
$website = file_get_contents ($web);
preg_match_all("/velkommen.+?/sim", $website, $information);
//THIS SHOULD SPECIFICY THE VERY LAST 'VELKOMMEN' - it doesn't, I know :(
for($i = 0; $i < count($information[0]); $i++){
preg_match_all("/Velkommen (.+?)\"/sim", $information[0][$i], $text, PREG_SET_ORDER);
$psychologist[$i]['text'] = mysql_real_escape_string($text[0][1]);
}
Thank you to anyone who can solve this puzzle, from the wonderful country of Denmark.
When you want to fetch only a certain amount of data you can use a filestream.
It would look something like this:
$handle = fopen("http://www.example.com/", "r"); // open a filestream
// Fetch for example only 10 bytes each time we check
$chunkSize = 10;
$contents = "";
while ( !feof( $handle ) && strlen($contents) < 360) {
$buffer = fread( $handle, $chunkSize );
$contents .= $buffer;
}
$status = fclose( $handle );
//your data is stored in $contents
"the scraped data should be preceeding the word 'velkommen'":
preg_replace_callback('/velkommen(.*){360}/i',
function($matched) {
// Use $matched[1] to perform further testing
},
$contents
);
It's hacky, but it will get you started. Requires PHP 5.4 I believe.

Getting the file name from a text file after string matching - PHP

I have a log file (log.txt) in the form:
=========================================
March 01 2050 13:05:00 log v.2.6
General Option: [default] log_options.xml
=========================================
Loaded options from xml file: '/the/path/of/log_options.xml'
printPDF started
PDF export
PDF file created:'/path/of/file.1.pdf'
postProcessingDocument started
INDD file removed:'/path/of/file.1.indd'
Error opening document: '/path/of/some/filesomething.indd':Error: file doesnt exist or no permissions
=========================================
March 01 2050 14:15:00 log v.2.6
General Option: [default] log_options.xml
=========================================
Loaded options from xml file: '/the/path/of/log_options.xml'
extendedprintPDF started
extendedprintPDF: Error: Unsaved documents have no full name: line xyz
Note: Each file name is of the format: 3lettersdatesomename_LO.pdf/indd. Example: MNM011112ThisFile_LO.pdf. Also, on a given day and time, the entry could either have just errors, just the message about the file created or both, like I have shown here.
The file continues this way. And, I have a db in the form:
id itemName status
1 file NULL
And so on...
Now, I am expected to go through the log file and for each file that is created or if there in an error, I am supposed to update the last column of DB with appropriate message: File created or Error. I thought of searching the string "PDF file created/Error" and then grabbing the file name.
I have tried various things like pathinfo() and strpos. But, I can't seem to understand how I am going to get it done.
Can someone please provide me some inputs on how I can solve this? The txt file and db are pretty huge.
NOTE: I provided the 2nd entry of the log file to be clear that the format in which errors appear IS NOT consistent. I would like to know if I can still achieve what I am supposed to with an inconsistent format for errors.
Can somebody please help after reading the whole question again? There have been plenty of changes from the first time I posted this.
You can use the explode method of php to break your file into pieces of words.
In case the fields in your text file are tab separated then you can explode on explode(String,'\t'); or else in case of space separated, explode on space.
Then a simple substr(word,start_index,length) on each word can give you the name of file (here start_index should be 0).
Using mysql_connect will help you connect to mysql database, or a much efficient way would be to use PDO (PHP Data Objects) to make your code much more reliable and flexible.
Another way out would be to use the preg_match method and specify a regular expression matching your error msg and parse for the file name.
You can refer to php.net manual for help any time.
Are all of the files PDFs? If so you can do a regex search on files with the .pdf extension. However, if the filename is also contained in the error string, you will need to exclude that somehow.
// Assume filenames contain only upper/lowercase letters, 0-9, underscores, periods, dashes, and forward slashes
preg_match_all('/([a-zA-Z0-9_\.-/]+\.pdf)/', $log_file_contents, $matches);
// $matches should be an array containing each filename.
// You can do array_unique() to exclude duplicates.
Edit: Keep in mind, $matches will be a multi-dimensional array as described http://php.net/manual/en/function.preg-match-all.php and http://php.net/manual/en/function.preg-match.php
To test a regex expression, you can use http://regexpal.com/
Okay, so the main issue here is that you either don't have a consistent delimiter for "entries"..or else you are not providing enough info. So based on what you have provided, here is my suggestion. The main caveat here is that without a solid delimiter for "entries," there's no way to know for sure if the error matches up with the file name. The only way to fix this is to format your file better. Also you have to fill in some blanks, like your db info and how you actually perform the query.
$handle = fopen("log.txt", "rb");
while (!feof($handle)) {
// get the current row
$row = fread($handle, 8192);
// get file names
preg_match('~^PDF file created:(.*?)$~',$row,$match);
if ( isset($match[1]) ) {
$files[] = $match[1];
}
// get errors
preg_match('~^Error:(.*?)$~',$row,$match);
if ( isset($match[1]) ) {
$errors[] = $match[1];
}
}
fclose($handle);
// connect to db
foreach ($files as $k => $file) {
// assumes your table just has basename of file
$file = basename($file);
$error = ( isset($errors[$k]) ) ? $errors[$k] : null;
$sql = "update tablename set status='$error' where itemName='$file'";
// execute query
}
EDIT: Actually going back to your post, it looks like you want to update a table not insert, so you will want to change the query to be an update. And you may need to further work with $file in that foreach for your where clause, depending on how you store your filenames in your db (for example, if you just store the basename, you will likely want to do $file = basename($file); in the foreach). Code updated to reflect this.
So hopefully this will point you in the right direction.

How can I make a simple path generator?

I have a link that I have to repeat 50 times, for each folder, and I have 15 folders.the link that I have to repeat looks like this:
now, the jpg files are named car 1- car 50. and I would really like to be able to generate this script so that I can input the path "update/images/Cars/" the picture title (car) and the input the number of times that I need this link, and then have it spit out something that looks like this:
and then it keeps repeating, I'm assuming this can be done with a counter, but I'm not sure. Thanks!
You can do it with a for loop:
$path = "update/images/Cars/";
$title = "car";
$times = 50;
for($i = 1; $i <= $times; $i++)
echo "\n";
I used $title for the lightbox argument since you didn't specify
Use a powerful text editor. ;-)
For example, in Vim, I can use the following sequence of keystrokes to create your required text:
i
Esc
qa (start recording macro into register a)
Y (yank (= copy) whole line)
p (paste into the following line)
/ ( Return (search for opening brace)
Space (advance cursor one character so it now sits on the number)
Ctrl+a (increment the number)
q (stop recording the macro)
49#a (invoke the macro 49 times)
If you're going to add or remove images from the folder, then you might get better results using the DirectoryIterator object from the Standard PHP Library. Using it would require PHP5, but there's an old-school way of handling it, too. This snippet assumes that all of the files in the directory are the images you want to list:
$link = '%s';
$dir = new DirectoryIterator("/path/to/update/images/Cars");
foreach($dir as $file) if(!$file->isDot()) echo sprintf($link, $file, $file);
Notice that I put the information about the anchor-element into the $link variable and then used sprintf to print those anchors to the screen. If you don't have PHP5 available to you, you'd want to do it this way:
$link = '%s';
$dir = opendir("/path/to/update/images/Cars");
while(($file = readdir($dir)!==false) if($file != "." && $file != "..") echo sprintf($link, $file, $file);
closedir($dir);
These would only be necessary if you're adding more car photos into the library and don't want to update the page that produces all the links. Both of these snippets should automatically search through the directory of car images and create the links you need.
You can also alter these snippets to search through sub-directories, so you could slam out the links to the images in all 15 folders all with a little bit more code. Let me know if you want to see that code, too.

Categories