How would I display BLOB data with PHP? I've entered the BLOB into the DB, but how would I retrieve it? Any examples would be great.
I considered voting to close this a a duplicate, but the title is pretty good, and looking through other questions, I don't find a complete answer to a general question. These sorts of questions betray an absence of understanding of the basics of HTTP, so I wrote this long answer instead. I've glossed over a bit, but anyone who understands the following probably wouldn't need to ask a question like this one. Or if they did, they'd be able to ask a more specific question.
First - If you're storing images or other files in the database, stop and reconsider your architecture. RDBMSes aren't really optimized to handle BLOBs. There are a number of (non-relational) databases that are specifically tuned to handle files. They are called filesystems, and they're really good at this. At least 95% of the time that I've found regular files stuck in a RDBMS, it's been pointless. So first off, consider not storing the file data in the database, use the filesystem, and store some small data in the database (paths if you must, often you can organize your filesystem so all you need is a unique id).
So, you're sure you want to store your blob in the database?
In that case, you need to understand how HTTP works. Without getting into too much detail, whenever some client requests a URL (makes an HTTP Request), the server responds with a HTTP Response. A HTTP response has two major parts: the headers, and the data. The two parts are separated by two consecutive newlines.
Headers, on the wire, are simple plain-text key/value pairs that look like:
Name: value
and are separated by a newline.
The data is basically a BLOB. It's just data. The way that data is interpreted is decided (by the client) based on the value of the Content-Type header that accompanies it. The Content-Type header specifies the Internet Media Type of the data contained in the data section.
See it work
There's nothing magic about this. For a regular HTML page, the whole response is human readable. Try the following:
$ telnet google.com 80 # connect go google.com on port 80
You'll see something like:
Trying 74.125.113.104...
Connected to google.com.
Escape character is '^]'.
Now type:
GET /
(followed by return).
You've just made a very simple HTTP request! And you've probably received a response. Look at the response. You'll see all the headers, followed by a blank line, followed by the HTML code of the google home page.
So what?
So now you know what web servers do. They take requests (like GET /), and return responses (comprised of headers followed by a blank line (two consecutive newlines) followed by data).
Now, it's time to realize that:
Your web application is really just a customized web server
All that code you write takes whatever the request is, and translates it into an HTTP response. So you're basically just making a specialized version of apache, or IIS, or nginx, or lighty, or whatever.
Now, the default way that a web server usually handles requests is to look for a file in a directory (the document root), look at it to figure out which headers to send, and then send those headers, followed by the file contents.
But, while your webserver does all that magically for files in the filesystem, it is completely ignorant of some BLOB in an RDBMS. So you have to do it yourself.
If you know the contents of your BLOB are, say, a JPG image that should be named based on a "name" column in the same table, you might do something like:
<?php
$result = query('select name, blobdata from table where id = 5');
$row = fetch_assoc($result);
header('Content-Type: image/jpeg');
echo $row['blobdata'];
?>
(If you wanted to hint that browser should download the file instead of display it, you might use an additional header like: header('Content-Disposition: attachment; filename="' . $row['name'].'"');)
PHP is smart enough to provide the header() function, which sets headers, and makes sure they're sent first (and separated form the data). Once you're done setting headers, you just send your data.
As long as your headers give the client enough information about how to handle the data payload, everything is hunkey-dorey.
Hooray.
Simple example:
$blob_data = "something you've got from BLOB field";
header('Content-type: image/jpeg'); // e.g. if it's JPEG image
echo $blob_data;
Related
for academic reasons I need to scrape a North Korean dictionary (having already informed myself about the copyright-related issues), which 'actually' should be quite simple: The website is returned by a PHP script, which just uses ascending numbers in the URLs for each dictionary entry is:
uriminzokkiri.com/uri_foreign/dic/index.php?page=1
and the last entry is located at:
uriminzokkiri.com/uri_foreign/dic/index.php?page=313372
So basically I'd assume that the easiest way to do this is writing a simple shell script where the number of entry gets incremented using a loop construction, plus checking whether a site got downloaded successfully, since the connection is not good, so that it repeats trying to download the site until it was successful (also trivial).
But then I tried to download a site containing an entry to test this, which failed. The site makes use of session cookies, so I first saved the according cookie in a file using the "-c" parameter and then invoked curl with the "-v" (verbose) and "-b" (get cookie(s) from file) parameter, which produced the following output:
curl output
These are the request and response headers as being shown by Firebug:
Request/Response headers
I also tried to pass all these request headers using the "-H" parameter, however this didn't work neither.
Someone started coding a Python-based scraper for scraping this dictionary, but if this could be realized using a simple bash script, then this looks a bit like an overkill to me.
Does anyone know why the approach I tried so far doesn't work and how this could be achieved?
Many thanks in advance and kind regards
you could put some more Http-Headers like:
Origin: witch is domain of original site you scrap.
User-Agent: witch is your client configuration witch you can get from internet.
otherwise you can get bash curl script from your browser code inspection then convert it to php code. all automated and exist online.
I am concerned about the safety of fetching content from unknown url in PHP.
We will basically use cURL to fetch html content from user provided url and look for Open Graph meta tags, to show the links as content cards.
Because the url is provided by the user, I am worried about the possibility of getting malicious code in the process.
I have another question: does curl_exec actually download the full file to the server? If yes then is it possible that viruses or malware be downloaded when using curl?
Using cURL is similar to using fopen() and fread() to fetch content from a file.
Safe or not, depends on what you're doing with the fetched content.
From your description, your server works as some kind of intermediary that extracts specific subcontent from a fetched HTML content.
Even if the fetched content contains malicious code, your server never executes it, so no harm will come to your server.
Additionally, because your server only extracts specific subcontent (Open Graph meta tags, as you say),
everything else that is not what you're looking for in the fetched content is ignored,
which means your users are automatically protected.
Thus, in my opinion, there is no need to worry.
Of course, this relies on the assumption that the content extraction process is sound.
Someone should take a look at it and confirm it.
does curl_exec actually download the full file to the server?
It depends on what you mean by "full file".
If you mean "the entire HTML content", then yes.
If you mean "including all the CSS and JS files that the feched HTML content may refer to", then no.
is it possible that viruses or malware be downloaded when using curl?
The answer is yes.
The fetched HTML content may contain malicious code, however, if you don't execute it, no harm will come to you.
Again, I'm assuming that your content extraction process is sound.
Short answer is file_get_contents is safe you retrieve data, even curl is. It is up to you what you do with that data.
Few Guidelines:
1. Never Run eval on that data.
2. Don't save it to database without filtering.
3. Don't even use file_get_contents or curl.
Use: get_meta_tags
array get_meta_tags ( string $filename [, bool $use_include_path = false ] )
// Example
$tags = get_meta_tags('http://www.example.com/');
You will have all meta tags parsed, filtered in an array.
you can use httpclient.class instead of file_get_content or curl. because it connect's the page through the socket.After download the data you can take the meta data using preg_match.
Expanding on the answer made by Ray Radin.
Tips on precautionary measures
He is correct that if you use sound a sound process to search the fetched resource there should be no problem in fetching whatever url is provided. Some examples here are:
Don't store the file in a public facing directory on your webserver. Then you expose yourself to this being executed.
Don't store it in a database, this might lead to a second order sql injection attack
In general, don't store anything from the resource you are requesting, if you have to do this use a specific whitelist of what you are searching for
Check the header information
Even though there is no foolprof way of validating what you are requesting with a specific url. There are ways you can make your life easier and prevent some potential issues.
For example a url might point to a large binary, large image file or something similar.
Make a HEAD request first to get the header information. Then look at the Content-type and Content-length headers to see if the content is a plain text html file
You should however not trust these since they can be spoofed. Doing this will hovewer make sure that even non-malicous content won't crash your script. Requesting image files is presumably something you don't want users to do.
Guzzle
I recommend using Guzzle to do your request since it is in my opinion provides some functionallity that should make this easier
It is safe but you will need to do a proper data check before using it. As you should with any data input anyway.
We've developed an irc bot in php which, (among many other functions), will respond with the page title of any url a user sends to the channel. The problem i'm having is that when someone puts the url of an image or a file, the bot tries to retrieve that file or image.
I'm trying to determine the best way to go about solving this issue. Should I filter the url inputs and regex them for all possible file types? That seems daunting and exhaustive, to say the least. If anyone caught on to it they could simply put a huge file somewhere with a senseless extension and then say that url in the channel and time the bot out.
I feel like i'm missing a curl option which could make it simply ignore file retrievals which aren't simply ascii in nature. Any advice or suggestions?
One idea could be that you do a HEAD request first and if the content type is text/html you download it otherwise you don't. Or you could just read the first 1000 characters (or something small) and check if the title is there. And if isn't you assume it is something else than html.
How can I prevent (unauthorized) people from reading a message on a website (e.g. by looking in the browser cache for the text/images)?**
It's a PUBLIC (!) site (means: no logins here!)
But:
the (secret) message is only shown for a certain time.
the message might be shown only if a passwort is given.
Problems:
In Opera for example page(=page contents/text) could be indexed by the browser and searched.
One idea was to create an image with the message ... but: Also images - even when a header "no cache" is send could be retrieved from FireFox's cache.
Also: Recreating the message from single characters as image does not work (at least I think so at the moment). I tried this method, but it makes output quite slow (writing this: I notice that I do not need to create the images at runtime, but could create images (of single letters) in advance and display/refer to them not by real, but pseudo random names in the HTML )
I also had the idea to output a encoded message (ROT13) (in HTML) but use JS .onload to decode the message immediately. Problem: If this code is in the HTML it could be recovered from the cache later on. At least if someone searches through the (Opera) cache the person would probably not think of entering search terms encoded.
Programming language is PHP.
You can't. What if someone takes a screenshot of this?
You could add the secret code to the page with javascript, after the page is loaded. You'd want to retrieve the secret code via AJAX, then write it to the page - that way, the code isn't cached in the HTML part of the source, and it isn't sitting in the javascript within the page's source code.
Content piped in with AJAX is pretty ephemeral, it won't be cached or otherwise recorded.
Since I don't know anything about your HTML or what (if any) javascript framework you might be using, I can't give you a code sample, but you should be able to work with the concept.
Realistically if it is sent to the client and displayed on screen then you can not prevent the message from being saved or stored on the client machine. Whatever you do to prevent that save could still be bypassed by a simple screenshot.
If you are not concerned about the person the message is targeted at saving said message then I think your best course of action would be to use Flash with Flash doing a call to the server to retrieve the message and display it. Another option may be to use javascript to perform some form of call (AJAX) to the server which then sends back the message and you alter the DOM to display the message. I don't think that would be cached but unless you use SSL it could be stored by intermediate proxies.
I know it's possible, but I can't seem to figure it out.
I have a mySQL query that has a couple hundred thousand results. I want to be able to send the results, but it seems the response requires content-length header to start downloading.
In phpMyAdmin, if you go to export a database, it starts the download right away, FF just says unknown file size, but it works. I looked at that code, but couldn't figure it out.
Help?
Thanks.
this document in section 14.14 states that In HTTP, it SHOULD be sent whenever the message's length can be determined prior to being transferred, unless this is prohibited by the rules in section 4.4. This means it DOESN'T have to be sent if you can't say the size of it.
Just don't send it.
If you want to send parts of data to the browser before all data is available, do you flush your output buffer? Maybe that's what is the problem, not lack of a header?
The way you use flush is like that:
generate some output, which should add it to the buffer
flush() it, which should send current buffer to the client
goto 1
So, if your query returns a lot of results, you could just generate the output for, lets say, 100 or 1000 of them, then flush, and so on.
Also, to tell the client browser to attempt to save a file instead of displaying it in window, you can try using the Content-disposition: attachment header as well. See the specification here, 19.5.1 section.
You can use chunked transfer-encoding. Basically, you send a "Transfer-encoding: chunked" header, and then the data is sent in chunked mode, meaning that you send the length of a chunk followed by the chunk. You keep repeating this until the end of data, at which point you send a zero-length chunk.
Details are in RFC 2616.
It’s possible that you are using gzip which waits for all content to be generated. Check your .htaccess for directives regarding this.
You don’t need to set a Content-Length header field. This header field is just to tell the client amount of data it has to expected. In fact, a “wrong” value can cause that the client discards some data.
If you use the LiveHTTPHeaders plugin in FireFox, you can see the difference between the headers being sent by phpMyAdmin and the headers being sent by your application. I don't know specifically what you're missing, but this should give you a hint. One header that I see running a quick test is "Transfer-Encoding: chunked", and I also see that they're not sending content-length. Here's a link to the FireFox plugin if you don't already have it:
LiveHTTPHeaders
http content length is a SHOULD field, so you can drop it...
but you have to set transfer encoding then
have a look at Why "Content-Length: 0" in POST requests?