finding "orphan" files in website structure

finding "orphan" files in website structure - php

I have a site that I've built that is connected to a mysql database. I have items (listed in the database) that correspond to pictures. Database structure is:
Item_num Description Price Available
Item_num is a unique alphanumeric (A001).
In my images folder, I have several photos of A001, labeled:
A001_full.jpg
A001_thumb.jpg
A001_model.jpg
This is fairly consistent. Some pictures don't have the _model version, but all have the _thumb and _full versions. Unfortunately, I added a bunch of pictures then abandoned the site. As i'm bringing it back online, a lot of those pictures do not have an SQL entry to match them. What I would like to do is this:
Import the directory listing of the images (../images)
grab the first part of the file name (before the '_'), thinking strtok for that
use that token to query against the Item_num key in the database
if that key is found, move to the next file
if that key is not found, output the token to the page
i'm really unsure of the directory listing and how to handle that, and how to run multiple repeated queries to the database. I'm used to running one query then using the results from that. Any help on this would be highly appreciated

You can actually do this with a single query, and then loop through the results. It will be faster than running one query per entry. Something like this:
$query = "SELECT Item_num FROM table";
Then we'll assume you've dumped the result an indexed array where each element is the value of Item_num for each row into $result (the specifics of creating this array may depend on how you're interacting with the database):
// Loop through all images
foreach ( glob( '../images/*' ) as $image ) {
// Get the portion of the name before the underscore
list( $image_item ) = explode( '_', $image );
// Compare this to all of the return values
if ( !in_array( $image_item, $result )) {
// The image is not in the database
}
}

Related

Is it worth to save keyword <-> link relation into "hastable" like structure in mysql?

im working on PHP + MySQL application, which will crawl HDD/shared drive and index all files and directories into database, to provide "fulltext" search on it. So far im doing well, but im stuck on question, if i chosed good way how to store data into database.
On picture below, you can see part schema of my database. Thought is, that i'm saving domain (which represents part of disk which i wana to index) then there are some link(s) (which represents files and folder (with content, filepath, etc) then i have table to store sole (uniq) keywords, which i find in file/folder name or content.
And finaly, i have 16 tables linkkeyword to store relations between links and keywords. I have 16 of them because i thought it might be good to make something like hashtable, because im expecting high number of relations between link <-> keyword. (so far for 15k links and 400k keywords i have about 2.5milion of linkkeyword records). So to avoid storing so much data into one table (and later search above them) i thought that this hastable can be faster. It works like i wana to search for word, i compute it md5 and look at first character of md5 and then i know to which linkkeyword table i should use. So there is only about 150~200k records in each linkkeyword table (against 2.5milions)
So there im curious, if this approach can be of any use, or if will be better to store all linkkeyword information to single table and mysql will take care of it (and to how much link<->keyword it can work?)
So far this was great solution to me, but i crushed hard when i tried to implement regular-expression search. So user can use e.g. "tem*" which can result in temp, temporary, temple etc... In normal way when searching for word, i will conpute in md5 hash and then i know to which linkkeyword table i need to look. But for regular expression i need to get all keywords from keywords table (which matches regular expression) and then process them one by one.
Im also attaching part of code for normal keyword search
private function searchKeywords($selectedDomains) {
$searchValues = $this->searchValue;
$this->resultData = array();
foreach (explode(" ", $searchValues) as $keywordName) {
$keywordName = strtolower($keywordName);
$keywordMd5 = md5($keywordName);
$selection = $this->database->table('link');
$results = $selection->where('domain.id', $selectedDomains)->where('domain.searchable = ?', '1')->where(':linkkeyword' . $keywordMd5[0] . '.keyword.keyword LIKE ?', $keywordName)
->select('link.*,:linkkeyword' . $keywordMd5[0] . '.weight,:linkkeyword' . $keywordMd5[0] . '.keyword.keyword');
foreach ($results as $result) {
$keyExists = array_key_exists($result->linkId, $this->resultData);
if ($keyExists) {
$this->resultData[$result->linkId]->updateWeight($result->weight);
$this->resultData[$result->linkId]->addKeyword($result->keyword);
} else {
$domain = $result->ref('domain');
$linkClass = new search\linkClass($result, $domain);
$linkClass->updateWeight($result->weight);
$linkClass->addKeyword($result->keyword);
$this->resultData[$result->linkId] = $linkClass;
}
}
}
}
and regular expression search function
private function searchRegexp($selectedDomains) {
//get stored search value
$searchValues = $this->searchValue;
//replace astering and exclamation mark (counted as characters for regular expression) and replace them by their mysql equivalent
$searchValues = str_replace("*", "%", $searchValues);
$searchValues = str_replace("!", "_", $searchValues);
// empty result array to prevent previous results to interfere
$this->resultData = array();
//searched phrase can be multiple keywords, so split it by space and get results for each keyword
foreach (explode(" ", $searchValues) as $keywordName) {
//set default link result weight to -1 (default value)
$weight = -1;
//select all keywords, which match searched keyword (or its regular expression)
$keywords = $this->database->table('keyword')->where('keyword LIKE ?', $keywordName);
foreach ($keywords as $keyword) {
//count keyword md5 sum to determine which table should be use to match it links
$md5 = md5($keyword->keyword);
//get all link ids from linkkeyword relation table
$keywordJoinLink = $keyword->related('linkkeyword' . $md5[0])->where('link.domain.searchable','1');
//loop found links
foreach ($keywordJoinLink as $link) {
//store link weight, for later result sort
$weight = $link->weight;
//get link ID
$linkId = $link->linkId;
//check if link already exists in results, to prevent duplicity
$keyExists = array_key_exists($linkId, $this->resultData);
//if link already exists in result set, just update its weight and insert matching keyword for later keyword tag specification
if ($keyExists) {
$this->resultData[$linkId]->updateWeight($weight);
$this->resultData[$linkId]->addKeyword($keyword->keyword);
//if link isnt in result yet, insert it
} else {
//get link reference
$linkData = $link->ref('link', 'linkId');
//get information about domain, to which link belongs (location, flagPath,...)
$domainData = $linkData->ref('domain', 'domainId');
//if is domain searchable and was selected before search, add link to result set. Otherwise ignore it
if ($domainData->searchable == 1 && in_array($domainData->id, $selectedDomains)) {
//create new link instance
$linkClass = new search\linkClass($linkData, $domainData);
//insert matching keyword to links keyword set
$linkClass->addKeyword($keyword->keyword);
//set links weight
$linkClass->updateWeight($weight);
//insert link into result set
$this->resultData[$linkId] = $linkClass;
}
}
}
}
}
}

Your question is mostly one of opinion, so you may want to include the criteria that allow us to answer "worth it' more objectively.
It appears you've re-invented the concept of database sharding (though without distributing your data across multiple servers).
I assume you are trying to optimize search time; if that's the case, I'd suggest that 2.5 million records on a modern hardware is not a particularly big performance challenge, as long as your queries can use an index. If you can't use an index (e.g. because you're doing a regular expression search), sharding will probably not help at all.
My general recommendation with database performance tuning is to start with the simplest possible relational solution, keep tuning that until it breaks your performance goals, then add more hardware, and only once you've done that should you go for "exotic" solutions like sharding.
This doesn't mean using prayer as a strategy. For performance-critical application, I typically build a test database, where I can experiment with solutions. In your case, I'd build a database with your schema without the "sharding" tables, and then populate it with test data (either write your own population routines, or use a tool like DBMonster). Typically, I'd go for at least double the size I expect in production. You can then run and tune queries to prove, one way or another, whether your schema is good enough. It sounds like a lot of work, but it's much less work than your sharding solution is likely to bring along.
There are (as #danFromGermany comments) solutions that are optimized for text serach, and you could use MySQL fulltext search features rather than regular expressions.

How To Import Movielens Data To Mysql

How can i import UTF-8 data form Movielens to MySql.
I get the data from http://grouplens.org/datasets/movielens/ and for my recommender system Thesis purpose, i just want the 100K and Tag Gnome data only.
I've been searching on google and in this forum and i don't find anything about importing these files to MySQl. Myself, currently using PhpMyAdmin for managing MySQL, so if anybody know how to easily import those files to MySQL.
I'm fine if you guys recommend me to iterate it one by one using php, but please explain to me the code.

You'll need to write some custom code to import all of their data into MySQL. Dumbest answer on Stack Overflow ever, right?
So they provide a set of flat files, each described in the README.
README
allbut.pl
mku.sh
u.data
u.genre
u.info
u.item
u.occupation
u.user
u1.base
u1.test
u2.base
u2.test
u3.base
u3.test
u4.base
u4.test
u5.base
u5.test
ua.base
ua.test
ub.base
ub.test
In a nutshell:
Make your own database and tables in MySQL.
Programatically open a file and parse each line to SQL.
Import the SQL into MySQL.
???
Profit!
Yeah, I know I still haven't really told you anything, let's do one and you can hopefully do the others.
I'll do u.genre, because I'm lazy and it is easy.
Make a new table, I'll assume you know how to make tables and such.
u.genre has two things: a genre and an id.
unknown|0
Action|1
...etc...
So your table should have two fields.
You'll use two data types: https://dev.mysql.com/doc/refman/5.7/en/data-types.html
id - unsigned TINYINT
TINYINT unsigned is 0 to 255
genre - VARCHAR(20)
VARCHAR 20 is up to 20 characters, their longest is "Documentary" so that'll give you a bit of extra room if they add a new one.
Open the file get the contents: https://secure.php.net/manual/en/function.file-get-contents.php
$filecontents = file_get_contents("u.genre");
Now let's split up the file by line: https://secure.php.net/manual/en/function.explode.php
$genres = explode("\n", $filecontents);
Now we'll loop through the $genres using foreach and explode again: https://secure.php.net/manual/en/control-structures.foreach.php
foreach ($genres as &$row) {
list($genre,$id) = explode("|",$row);
# more here later
}
Now let's just output SQL, skipping if either of the fields are empty.
if ($genre!="" && $id!=="") {
print "INSERT INTO genre (genre,id) VALUES ($genre,$id);\n";
}
Put it all together...
<?php
$filecontents = file_get_contents("u.genre");
$genres = explode("\n", $filecontents);
foreach ($genres as &$row) {
list($genre,$id) = explode("|",$row);
if ($genre!="" && $id!=="") {
$sql = "INSERT INTO genre (genre,id) VALUES ($genre,$id);\n";
print $sql;
# Insert each into your DB here.
}
}
?>
Save it and run it from the commandline or put it in a browser for no good reason.
There are too many resources out there showing how to insert data into MySQL, so I'll leave it at this. Everyone's database setup is a bit different, so writing it up for my particular setup won't help you.

Select a text field from mysql in php

usersim interested how do i select a text field form my mysql database, i have a table named users with a text field called "profile_fields" where addition user info is stored. How do i access it in php and make delete it? I want to delete unvalidate people.
PHP code
<?php
//Working connection made before assigned as $connection
$time = time();
$query_unactive_users = "DELETE FROM needed WHERE profile_fields['valid_until'] < $time"; //deletes user if the current time value is higher then the expiring date to validate
mysqli_query($connection , $query_unactive_users);
mysqli_close($connection);
?>
In phpmyadmin the field shows (choosen from a random user row):
a:1:{s:11:"valid_until";i:1370695666;}
Is " ... WHERE profile_fields['valid_until'] ..." the correct way?

Anyway, here's a very fragile solution using your knowledge of the string structure and a bit of SUBSTRING madness:
DELETE FROM needed WHERE SUBSTRING(
profile_fields,
LOCATE('"valid_until";i:', profile_fields) + 16,
LOCATE(';}', profile_fields) - LOCATE('"valid_until";i:', profile_fields) - 16
) < UNIX_TIMESTAMP();
But notice that if you add another "virtual field" after 'valid_until', that will break...

You can't do it in a SQL command in a simple and clean way. However, the string 'a:1:{s:11:"valid_until";i:1370695666;}' is simply a serialized PHP array.
Do this test:
print_r(unserialize('a:1:{s:11:"valid_until";i:1370695666;}'));
The output will be:
Array ( [valid_until] => 1370695666 )
So, if you do the following, you can retrieve your valid_until value:
$arrayProfileData = unserialize('a:1:{s:11:"valid_until";i:1370695666;}');
$validUntil = arrayProfileData['valid_until'];
So, a solution would be to select ALL items in the table, do a foreach loop, unserialize each "profile_fields" field as above, check the timestamp, and store the primary key of each registry to be deleted, in a separate array. At the end of the loop, do a single DELETE operation on all primary keys you stored in the loop. To do that, use implode(',', $arrayPKs).
It's not a very direct route, and depending on the number of registers, it may not be slow, but it's reliable.
Consider rixo's comment: if you can, put the "valid_until" in a separate column. Serializing data can be good for storage of non-regular data, but never use it to store data which you may need to apply SQL filters later.

MySQL Database & PHP - Match row with files and append extension

i've got the following problem:
I've taken over a MS-SQL Database from my superior which has been developed by him. Sadly the database is in really bad shape for development.
The Database has already been "converted" to MySQL by me and the data imported. Now the problem is, theres a table "hotels" which had got rows named "image1, image2, image3" etc up to image24. I removed them from the table and created a new table called hotel_images where the images are assigned to a hotel. Now to describe my problem :
The imported data contained strings for each image such as "007593-20110809-145433-01" but the extension was missing. All the images were placed in the same directory (there are about 4000) and only the string has been saved.
I already did a workaround function myself when pulling the data into the website where i check file_exists and then return the different extensions (.BMP, .GIF, JPG etc) but i don't like this solution.
Is there any possiblity for me to check all strings available in a single table with the image folder and add the proper extension to the table if the string matches? It must be something like
SELECT image from hotel_images (search for value in /images/) IF MATCH ALTER TABLE hotel_images set image = this + .extension
I would appreciate any advice!
Edit: it just came to my attention that i could do a dir listing to a text-file from the folder and then match it against every string in the table and if match replace it - is that a possible solution?

You have to SELECT all rows (without extension)
Then, in PHP, foreach on all images to find if they're existing in the folder, take their extensions
Then UPDATE row with the existing filename, adding the extension...
With an example :
$images_query = mysql_query("SELECT id, image_name from hotel_images");
while($image = mysql_fetch_array($images_query)){
if(file_exists($image["image_name"])){
//Get extension
$ext = "...";
//Then update row with new name
mysql_query("UPDATE hotel_images SET image_name = '" . $image["image_name"] . $ext ."' WHERE id = " . $image["id"]);
}
}
Are you searching for something like that ?
It's not tested script, did it directly in the SO textarea ;)

How do I create an array of files based on rows in a database?

In PHP, I have an array of photos:
$file = array('p1.jpg','p2.jpg','p3.jpg');
Now I want to make this dynamic based on the rows in the database and the uploaded/deleted files. So for example if the user uploaded 10 photos and deleted the 2nd, 3rd, 5th, 9th one, the array would now be:
$file = array('p1.jpg','p4.jpg','p6.jpg','p7.jpg','p8.jpg','p10.jpg');
When the user uploads the file, the php script changes the filename to be "p" + whatever-row-the-database-is-at + ".jpg".
I'm assuming I would be using a loop to test whether that row exists in the database, okay I'm good up to there, but how do I make that output the proper array that I need?

I guess you are new to PHP?
$file = array();
while(.....) { //your loop goes here
$file[] = ....; //add elements here
}

If you're looping over the entire array to check if an entry still exists in the DB, it might just be quicker to actually delete the entire array and rebuild it using the results of a new DB query (as the DB query will be faster at finding the rows of interest, than you iteratively calling the DB to check if a row is still of interest).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.