I have an entertaining "stuck file" issue

I have an entertaining "stuck file" issue - php

I'm in the process of creating a simple image admin tool that allows users to upload and delete images from a server. So far it's very basic stuff. I discovered that uploading files whose file name includes a blank space causes a problem - everything in the name up to the space is included in the uploaded file name, but nothing past that. The problem with this isn't so much broken image links (I can deal with those), but the images can't be deleted using my delete tool or the simplest "unlink" script. Once I discovered the problem and the cause, I contacted our support people and they cleared out the two problem files. Unfortunately, because of system constraints here at work, I don't have FTP access.
Now you'd think it's all resolved, but no. I added very clear warning messages - in bold red text - onto this prototype admin tool, and I let my boss and a few others know of the tool by email, explaining CLEARLY that file names with spaces don't work. So... what does my boss do? She uploaded two files with blanks in the file name.
So until I have a way to add an idiot filter into the script, can anyone suggest how I might be able to delete these two files? I'd even be willing to delete the populated directory and recreate it (none of the images there mean anything, just random stuff).
And if there's no advice about getting these things unstuck, is there any advice as to how I might prevent my boss from acting like a moron, short of cutting off her fingers?

my best answer would be, as you said, move all other files to another folder, delete folder, recreate folder.
After that, I'm thinking you need urlencode in your php to deal with weird characters and spaces in filenames.
One question that might help others, are you in Linux or Windows?
As a test, until it's working, try writing to a drive that you have access to so you can play until it works. Then switch to the live system. This way you don't have to call support to delete your tests.
Also, you could just substitute all spaces for underscores as a quick fix.

While deleting files using unlink() with their names having spaces in them, escape the spaces with "\". This will only work on Linux.

Related

Moodle: Files uploaded via File API get corrupted when viewed

So I am developing a new course-format, in which a picture is associated with each activity in a course, and presented visually. I created the course format, overrode the renderer etc. That worked all fine. However, the images are supposed to be custom generated and since it has to work for all existing and future, I put some additional code into the general course module form, enabling an image upload.
After admittedly some struggle on my part to get the File API working, it now all works fine. Only in my course format, there is an additional heading, under which you can upload a single image. This gets saved to the database fine, it is not in draft and it is viewable in my dataroots filedir perfectly if I follow the contenthash in the database. It even gets loaded into the form as a default fine. However, if I try to work with the image, all tests run fine (.is_valid_img()etc) and I even get offered to download a file. However, when I do it is corrupted and my file viewer says: "Critical Error: Not a png file". Needless to say it is not displayed on my actual course site.
When I look at the file in filedir, it very clearly is a png. Please, I would be thankful for any help, since I have tried alot and am at my wits end.

It sounds to me like you are getting some sort of output on the page before the PNG file is sent - that would be added to the start of the file and cause it not to work as a PNG file.
I would suggest you open the file in a hex editor and check the start of the file - it should look like https://en.wikipedia.org/wiki/Portable_Network_Graphics#File_header, so look for extra characters before that.
As for where the extra characters come from - they may be an obvious warning / error message (which should be easy to track down and fix). Alternatively, you may have some stray 'echo' statements (again, fairly easy to track down). The worst problems to find are extra characters before the opening 'php' tags of a file somewhere in your install or after the closing tag at the end of a file (which is why you should never use closing PHP tags). Finding these will come down to searching through all your customised code files to locate them.

use file_put_contents('test/xxx.jpg', $url), get wrong image

I use php. I try download a image from a url, my code worked for some url, and others don't work.I wanna my code to work for all url. Or tell me what lead to this.
here is My script:
$imgUrl = 'http://www.inc.com/uploaded_files/image/i-love-me_49961.jpg';
$imageData = file_put_contents('test/xxx.jpg', file_get_contents($imgUrl));
Right now, I can get this image file(xxx.jpg), but when I open saved file in ACDSee,I get nothing.
however, if I use "http://www.wired.com/wp-content/uploads/2014/11/faa-drones-ft-660x330.jpg", My script works.
Please help me.

Interesting. This is a case of file_get_contents failing to get the correct image, so to speak, but the best matched SO questions I found would not help you, because they are about different things.
I shall answer this by laying out how you should solve this type of problems.
Problem solving is the simple art of breaking down the problem, and check the smaller pieces one by one until the cause is pinpointed.
First, did you get anything saved?
If yes, that means you did get something, and we can exclude all data read write problems including file permissions, network problems, access denials, or lack of curl extension.
If you didn't get the saved file, these issues has to be checked one by one.
In your case, I trust that you did get the file.
So the problem is now with the actual data.
Usually, we first verify that the source is ok.
Open it in browser. Save it. Open saved file in ACDSee.
It works! This is how we confirm the source is working, and ACDSee is working.
(And that the OS/browser/network is working, actually.)
Which leave us the saved data.
No programs can open it as jpeg, so we can be pretty certain the saved file is not a jpeg.
What is it, then?
If you use a hex editor (e.g. HxD) to open the PHP saved file (not a jpeg) and the manually saved file (confirmed jpeg), you will see that they are simply totally different.
manually saved image: FF D8 FF E1 ...
PHP saved image: 1F 8B 08 ...
If you lookup these first few bytes, called file headers, you will see that the PHP saved file is a gzip file.
To confirm this, you can rename the file's extension to .gz. Unzip it, and viola there is the image!
Note: hex comparison is pretty useful in sorting out the occasional weird problems, such as unwanted bom markers, line break conversion on binary files, or messy server filters.
So hex editors are indispensable for a good programmer, even a web programmer.
At this stage, the question becomes, why did I get a gzipped file?
A web programmer should know what is wrong by now, but let's pretend we do not.
There is not much problem space left.
It is either file_put_contents or file_get_contents.
If you do a little PHP coding to get in between them, you will see that file_put_contents is returning the gzipped data.
But where did file_put_contents get its data?
From the network, of course!
Now, let me introduce your a software called Wireshark.
These software are called packet sniffers, and they can show you the raw data going through your network cable or wifi.
Note: Packet sniffers are not easy. You need to know network protocols really well to make sense of anything.
They belongs to a class of low level debuggers called system monitors, and are often the last resort.
But this final hand is one of the distinctions between an average programmer and an expert.)
Indeed, with a packet sniffer, we can confirm that the server is responding with gzip encoded content, using Content-Encoding: gzip.
And so, we now know that the real cause is file_get_contents does not automatically decompress gzip content.
With this correct question, stackoverflow already has answers.
This is how we approach pretty much every programming problems, and why we answer more than we ask.
Hope you enjoyed the journey, and hope you will become the tour guide one day.

Provide uploading files, write the file on hard disk, get full path (string) to add in database CakePHP

Since I am just experimenting on this, (only localhost) I may like to ask for some ideas(since nothing is really coming out of my mind) about letting a user, who is going to, for example, register to a mini-social-networking site, with a corresponding username/password, personal details, etc. I would upload the image, and save it to a folder(ON MY HARD DRIVE be it Drive C:\ or D:), for example '/images/username' and the full path of the folder would be the one inserted to a row named img_dir (of course it is a string, instead of putting it as a BLOB, so later i would just use img src="path"). I would not mind where it will be going to be saved. But since I am new to cakephp i haven't really grasped the idea of what I am going to do. I have no problems about registering/login sessions. This was easy in C# but I am too stupid for PHP maybe? :P

While this may not give you a direct solution in CakePHP, you had asked for some ideas.
I've outlines some pros and cons of storing a file on the filesystem (along with some other approaches) in this post.
Hope that helps...

I've written a complete plugin for that kind of task and it's more thought through then just the idea of saving the file path.
A file has some more meta data like it's size and mime type which is useful when the file is served. So an uploaded file should be handled as an entity of it's own. I personally think it is a bad idea to directly save the path to a file within the record it belongs to. What happens if you need two images later? Adding incrementing fields like path1, path2?
It is IMO better to have a separate table for files and associate records with these file records. Expressed in CakePHP associations: User hasOne Avatar or Gallery hasMany Image for example.
Also saving files in path like this uploads\username1\pic.jpg can result in slowing down the app because of file system performance issues if you get a lot directories and files within the same level of the file system.
However, check my plugins readme.md out, there is more about why it does things like it does to solve different kinds of issues you can run into.

CMS upload picture files security issue

I have a quick question if anyone could help. I am building a CMS for a client where they can log in, and change content (including pictures via upload file form) that are all stored in a database.
My question.. I have been researching, and everywhere says I need to store the image files outside the root folder. Is this necessary in my case if only a few people will be uploading files, inside an admin panel, where they must first log in to the site? I will have already taken steps client side by making sure of file type, size, extension etc... then changing the name of the file before adding it to my DB... Is this secure enough, or am I asking for trouble down the road?
Thanks

Its generally a good idea to store uploaded content someplace where it cant directly be addressed by a browser. You dont want someone uploading a .php file (or some other format you forgot to check for) and then being able execute it by pulling up the direct url. Rather, you'd have a wrapper script that delivered the file.
So yes, its a good idea, but not 'necessary' (by the dictionary definition of the word). You can certainly choose not to do so if in your judgement the admin area is otherwise secure.
That said, in the scenario you describe, as long as its only admin users who can upload images, I dont think its a huge deal either way.
btw, if you are not already, verify the images by their file headers or content, not file extension.

Keeping track of links or references to image files and deleting unused ones (PHP/Database)

I need a way to remove "unused" images from my filesystem, i.e. images that are never accessed from any point in my website (doesn't matter if I break external links. I might disable external hotlinking altogether). What's the best way of going about this? Regular users can add multiple attachments to topics/posts and content contributers can bulk upload large numbers of images which can be used in articles or image galleries.
The problem is that the images could be referenced in any of the following ways:
From user content (text/html, possibly Markdown or BBCode) stored in the database
Hardcoded into an HTML page
Hardcoded into a PHP file
Hardcoded into a CSS file
As an "attachment" field in a database table, usually containing only the filename itself with no path, because the application assumes that it would be in a certain folder.
And to top it off, the path of the image could be an absolute or relative HTTP or PHP path and may or may not be built with string concatenation in PHP.
So obviously find/replace or regexing the database or filesystem is out of the question. But luckily for you and me, this system isn't fully implemented yet and I don't need anything that deals with an existing hoard of images. I just need to set up some efficient structure that will allow this in the future.
Some ideas I've thought of:
Intercepting the HTTP request for the image with PHP, and keeping track of the HTTP_REFERER. The problem with this is that just because no one has clicked on a link at the time of checking this doesn't mean the link doesn't exist.
Use extreme database normalization - i.e. make a table for images and use foreign keys for anything that references it. However this would result in making a metric craptonne of many-to-many relationships (and the crosstables) in addition to being impractical for any regular user to use.
Backup all the images and delete them, and check every single 404 request and run a script each time that attempts to find the image from the backup folder and puts it in the "real" folder. The problem is that this cache would have to be purged every so often and the server might be strained when rebuilding the cache.
Ideas/suggestions? Is this just something you have to ignore and live with even if you're making a site with a ridiculous amount of images? Even if it's not worth it, how would something work just for proof-of-concept (I added the garbage-collection tag just because this might be going into that area conceptually).

I will admit that my experience with this was simpler than yours. I had no 'user generated content' so to speak, and my images were all in only templates or database with full path. But what I did is create a perl script that
Analyzed my HTML templates, database
table, and CSS generated a list of
files
In the HTML it looked for <img> tags
In the CSS it looked for any .png, .jp*g, or .gif regex strings
The tables were easy because I had an Image table for the image data
The files list was then
ordered to remove duplicates
The script iterated through the list and
wrote a csv like:
filename,(CSS filename|HTML filename|DBTABLE),(exists|notexists) for
auditing
In another iteration it
renamed all files not in the list by
appended .del to the filename
After regression testing I called the
script with a -docleanup tag which
told it to go through and delete all
the .del appended files.
If for whatever reason an image was tagged
as .del and shouldn't have been, I
just manually renamed it back to its
original form.
A couple of notes: I realize that I could have made this script 'smoother' and done multiple things in multiple steps, but its use grew over time and I wanted clearly delineated processing steps so it couldn't ever run amok. I used the CSV to go back and clean up the information where the image didn't exist.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.