How to detect .doc password protection - php

The following answer allows to detect password protected .docx files, after porting it to PHP: https://stackoverflow.com/a/14347730/1794894
$content = utf8_encode(file_get_contents($absolutePath));
if (mb_substr($content, 0, 2) == "ÐÏ") {
# DOC/XLS 2007+
$start = str_replace("\x00", " ", mb_substr($content, 0, 2000));
if (mb_strstr($start, 'E n c r y p t e d P a c k a g e') !== false) {
return true;
}
if ($extension == 'doc') {
return true;
}
}
How to also make a .doc specific check? Do .doc files also have a specific byte sequence? Or is it enough to only rely on the ÐÏ check at the first two characters of the file?
Or is the character at position 0x20B always equal to 0x13 in case of a password protected .doc?

Solved it based on the C# example from the post in the question.
We can not make use of a COM object, most servers do not run MS Word.
See Gist snippet: https://gist.github.com/rvanlaak/06ca1b65658a91240362

It should be something like this...
<?php
$word=new COM("word.application") or die("Cannot create Word object");
$word->Visible=false;
$word->WindowState=2;
$word->DisplayAlerts=false;
$doc = $word->Document->Open("/yourFile.doc");
$passwordProtect = $doc->Document->HasPassword;//true or false
$word->ActiveDocument->Close(false);
$word->Quit();
$word->Release();
$word=null;
?>
I can't test this code, hope it helps...

Related

How to retrieve digital signature information from PDF with PHP?

I have app that needs to retrieve some data (signer name) from digital signature "attached" on PDF files.
I have found only examples in Java and C# using the iText class AcroFields method GetSignatureNames
edit: I've tried pdftk with dump_data_fields and generate_fpdf and the result was that (unfortunately):
/Fields [
<<
/V /dftk.com.lowagie.text.pdf.PdfDictionary#3048918
/T (Signature1)
>>]
and
FieldType: Signature
FieldName: Signature1
FieldFlags: 0
FieldJustification: Left
Thanks in Advance !
Well, it's complicated (I would say even impossible, but who knows) to achieve this only with PHP.
At first, please read article about digital signature in Adobe PDF
Second, after reading this you will know that signature is stored between b and c bytes according to /ByteRange[a b c d] indicator
Third, we can extract b and c from document and then extract signature itself (guide says it will be hexdecoded PKCS7# object).
<?php
$content = file_get_contents('test.pdf');
$regexp = '#ByteRange\[\s*(\d+) (\d+) (\d+)#'; // subexpressions are used to extract b and c
$result = [];
preg_match_all($regexp, $content, $result);
// $result[2][0] and $result[3][0] are b and c
if (isset($result[2]) && isset($result[3]) && isset($result[2][0]) && isset($result[3][0]))
{
$start = $result[2][0];
$end = $result[3][0];
if ($stream = fopen('test.pdf', 'rb')) {
$signature = stream_get_contents($stream, $end - $start - 2, $start + 1); // because we need to exclude < and > from start and end
fclose($stream);
}
file_put_contents('signature.pkcs7', hex2bin($signature));
}
Forth, after third step we have PKCS#7 object in file signature.pkcs7. Unfortunately, I don't know methods to extract information from signature using PHP. So you must be able to run shell commands to use openssl
openssl pkcs7 -in signature.pkcs7 -inform DER -print_certs > info.txt
After running this command in file info.txt you will have a chain of certificates. Last one is the one you need. You can see the structure of the file and parse needed data.
Please also refer to this question, this question and this topic
EDIT at 2017-10-09
I knowingly advised you to see exactly this question
There is a code that you can adjust to your needs.
use ASN1\Type\Constructed\Sequence;
use ASN1\Element;
use X509\Certificate\Certificate;
$seq = Sequence::fromDER($binaryData);
$signed_data = $seq->getTagged(0)->asExplicit()->asSequence();
// ExtendedCertificatesAndCertificates: https://tools.ietf.org/html/rfc2315#section-6.6
$ecac = $signed_data->getTagged(0)->asImplicit(Element::TYPE_SET)->asSet();
// ExtendedCertificateOrCertificate: https://tools.ietf.org/html/rfc2315#section-6.5
$ecoc = $ecac->at($ecac->count() - 1);
$cert = Certificate::fromASN1($ecoc->asSequence());
$commonNameValue = $cert->tbsCertificate()->subject()->toString();
echo $commonNameValue;
I've adjusted it for you, but please make the rest by yourself.
This is my working code in PHP7:
<?php
require_once('vendor/autoload.php');
use Sop\ASN1\Type\Constructed\Sequence;
use Sop\ASN1\Element;
use Sop\X509\Certificate\Certificate;
$currentFile = "./upload/test2.pdf";
$content = file_get_contents($currentFile);
$regexp = '/ByteRange\ \[\s*(\d+) (\d+) (\d+)/'; // subexpressions are used to extract b and c
$result = [];
preg_match_all($regexp, $content, $result);
// $result[2][0] and $result[3][0] are b and c
if (isset($result[2]) && isset($result[3]) && isset($result[2][0]) && isset($result[3][0])) {
$start = $result[2][0];
$end = $result[3][0];
if ($stream = fopen($currentFile, 'rb')) {
$signature = stream_get_contents($stream, $end - $start - 2, $start + 1); // because we need to exclude < and > from start and end
fclose($stream);
}
$binaryData = hex2bin($signature);
$seq = Sequence::fromDER($binaryData);
$signed_data = $seq->getTagged(0)->asExplicit()->asSequence();
// ExtendedCertificatesAndCertificates: https://tools.ietf.org/html/rfc2315#section-6.6
$ecac = $signed_data->getTagged(0)->asImplicit(Element::TYPE_SET)->asSet();
// ExtendedCertificateOrCertificate: https://tools.ietf.org/html/rfc2315#section-6.5
$ecoc = $ecac->at($ecac->count() - 1);
$cert = Certificate::fromASN1($ecoc->asSequence());
$commonNameValue = $cert->tbsCertificate()->subject()->toString();
echo $commonNameValue;
}
I've used iText and found it to be very reliable, I highly recommend it.
you can always call the java code as a "microservice" from PHP.

I have a word doc. i want to get word count per page of word doc?

i could only find solution for per line but cant find page break; also confused a lot.
for docx also cant find exact word count.
function read_doc($filename) {
$fileHandle = fopen($filename, "r");
$line = #fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0D), $line);
$outtext = "";
foreach ($lines as $key => $thisline) {
if( $key > 11 ){
var_dump($thisline);
$pos = strpos($thisline, chr(0x00));
if (($pos !== FALSE) || (strlen($thisline) == 0)) {
continue;
} else {
var_dump($thisline);
$text = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t#\/\_\(\)]/", "", $thisline);
var_dump($text);
}
}
}
return $outtext;
}
Implementing your own code for this doesn't sound like a good idea. I would recommend using an external library such as PHPWord. It should allow you to convert the file to plain text. Then, you can extract the word count from it.
Also, an external library such as that adds support for a number of file formats, not restricting you to Word 97-2003.
Here's a basic piece of VB.NET code that counts words per page but be aware it depends on what Word considers to be a word, it is not necessarily what a user considers a word. In my experience you need to properly analyse how Word behaves, what it interprets and then build your logic to ensure that you get the results that you need. It's not PHP but it does the job and can be be a starting point for you.
Structure WordsPerPage
Public pagenum As String
Public count As Long
End Structure
Public Sub CountWordsPerPage(doc As Document)
Dim index As Integer
Dim pagenum As Integer
Dim newItem As WordsPerPage
Dim tmpList As New List(Of WordsPerPage)
Try
For Each wrd As Range In doc.Words
pagenum = wrd.Information(WdInformation.wdActiveEndPageNumber)
Debug.Print("Word {0} is on page {1}", wrd.Text, pagenum)
index = tmpList.FindIndex(Function(value As WordsPerPage)
Return value.pagenum = pagenum
End Function)
If index <> -1 Then
tmpList(index) = New WordsPerPage With {.pagenum = pagenum, .count = tmpList(index).count + 1}
Else
' Unique (or first)
newItem.count = 1
newItem.pagenum = pagenum
tmpList.Add(newItem)
End If
Next
Catch ex As Exception
WorkerErrorLog.AddLog(ex, Err.Number & " " & Err.Description)
Finally
Dim totalWordCount As Long = 0
For Each item In tmpList
totalWordCount = totalWordCount + item.count
Debug.Print("Page {0} has {1} words", item.pagenum, item.count)
Next
Debug.Print("Total word count is {0}", totalWordCount)
End Try
End Sub
When you unzip .doc or .docx file, you will get folder. Look for document.xml file in word subfolder. You will get whole document with xml syntax. Split string by page xml syntax, Strip xml syntax and use str_word_count.
What is figure out that i will need a windows server :-- using COM object ;;
Please check this link
https://github.com/lettertoamit/MS-Word-PER-PAGE-WORDCOUNT/blob/master/index.php

Shortest possible encoded string with a decode possibility (shorten URL) using only PHP

I'm looking for a method that encodes a string to the shortest possible length and lets it be decodable (pure PHP, no SQL). I have working script, but I'm unsatisfied with the length of the encoded string.
Scenario
Link to an image (it depends on the file resolution I want to show to the user):
www.mysite.com/share/index.php?img=/dir/dir/hi-res-img.jpg&w=700&h=500
Encoded link (so the user can't guess how to get the larger image):
www.mysite.com/share/encodedQUERYstring
So, basically I'd like to encode only the search query part of the URL:
img=/dir/dir/hi-res-img.jpg&w=700&h=500
The method I use right now will encode the above query string to:
y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA
The method I use is:
$raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$encoded_query_string = base64_encode(gzdeflate($raw_query_string));
$decoded_query_string = gzinflate(base64_decode($encoded_query_string));
How do I shorten the encoded result and still have the possibility to decode it using only PHP?
I suspect that you will need to think more about your method of hashing if you don't want it to be decodable by the user. The issue with Base64 is that a Base64 string looks like a base64 string. There's a good chance that someone that's savvy enough to be looking at your page source will probably recognise it too.
Part one:
a method that encodes an string to shortest possible length
If you're flexible on your URL vocabulary/characters, this will be a good starting place. Since gzip makes a lot of its gains using back references, there is little point as the string is so short.
Consider your example - you've only saved 2 bytes in the compression, which are lost again in Base64 padding:
Non-gzipped: string(52) "aW1nPS9kaXIvZGlyL2hpLXJlcy1pbWcuanBnJnc9NzAwJmg9NTAw"
Gzipped: string(52) "y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA=="
If you reduce your vocabulary size, this will naturally allow you better compression. Let's say we remove some redundant information.
Take a look at the functions:
function compress($input, $ascii_offset = 38){
$input = strtoupper($input);
$output = '';
//We can try for a 4:3 (8:6) compression (roughly), 24 bits for 4 characters
foreach(str_split($input, 4) as $chunk) {
$chunk = str_pad($chunk, 4, '=');
$int_24 = 0;
for($i=0; $i<4; $i++){
//Shift the output to the left 6 bits
$int_24 <<= 6;
//Add the next 6 bits
//Discard the leading ASCII chars, i.e make
$int_24 |= (ord($chunk[$i]) - $ascii_offset) & 0b111111;
}
//Here we take the 4 sets of 6 apart in 3 sets of 8
for($i=0; $i<3; $i++) {
$output = pack('C', $int_24) . $output;
$int_24 >>= 8;
}
}
return $output;
}
And
function decompress($input, $ascii_offset = 38) {
$output = '';
foreach(str_split($input, 3) as $chunk) {
//Reassemble the 24 bit ints from 3 bytes
$int_24 = 0;
foreach(unpack('C*', $chunk) as $char) {
$int_24 <<= 8;
$int_24 |= $char & 0b11111111;
}
//Expand the 24 bits to 4 sets of 6, and take their character values
for($i = 0; $i < 4; $i++) {
$output = chr($ascii_offset + ($int_24 & 0b111111)) . $output;
$int_24 >>= 6;
}
}
//Make lowercase again and trim off the padding.
return strtolower(rtrim($output, '='));
}
It is basically a removal of redundant information, followed by the compression of 4 bytes into 3. This is achieved by effectively having a 6-bit subset of the ASCII table. This window is moved so that the offset starts at useful characters and includes all the characters you're currently using.
With the offset I've used, you can use anything from ASCII 38 to 102. This gives you a resulting string of 30 bytes, that's a 9-byte (24%) compression! Unfortunately, you'll need to make it URL-safe (probably with base64), which brings it back up to 40 bytes.
I think at this point, you're pretty safe to assume that you've reached the "security through obscurity" level required to stop 99.9% of people. Let's continue though, to the second part of your question
so the user can't guess how to get the larger image
It's arguable that this is already solved with the above, but you need to pass this through a secret on the server, preferably with PHP's OpenSSL interface. The following code shows the complete usage flow of functions above and the encryption:
$method = 'AES-256-CBC';
$secret = base64_decode('tvFD4Vl6Pu2CmqdKYOhIkEQ8ZO4XA4D8CLowBpLSCvA=');
$iv = base64_decode('AVoIW0Zs2YY2zFm5fazLfg==');
$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
var_dump($input);
$compressed = compress($input);
var_dump($compressed);
$encrypted = openssl_encrypt($compressed, $method, $secret, false, $iv);
var_dump($encrypted);
$decrypted = openssl_decrypt($encrypted, $method, $secret, false, $iv);
var_dump($decrypted);
$decompressed = decompress($compressed);
var_dump($decompressed);
The output of this script is the following:
string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
string(30) "<��(��tJ��#�xH��G&(�%��%��xW"
string(44) "xozYGselci9i70cTdmpvWkrYvGN9AmA7djc5eOcFoAM="
string(30) "<��(��tJ��#�xH��G&(�%��%��xW"
string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
You'll see the whole cycle: compression → encryption → Base64 encode/decode → decryption → decompression. The output of this would be as close as possible as you could really get, at near the shortest length you could get.
Everything aside, I feel obliged to conclude this with the fact that it is theoretical only, and this was a nice challenge to think about. There are definitely better ways to achieve your desired result - I'll be the first to admit that my solution is a little bit absurd!
Instead of encoding the URL, output a thumbnail copy of the original image. Here's what I'm thinking:
Create a "map" for PHP by naming your pictures (the actual file names) using random characters. Random_bytes is a great place to start.
Embed the desired resolution within the randomized URL string from #1.
Use the imagecopyresampled function to copy the original image into the resolution you would like to output before outputting it out to the client's device.
So for example:
Filename example (from bin2hex(random_bytes(6))): a1492fdbdcf2.jpg
Resolution desired: 800x600. My new link could look like:
http://myserver.com/?800a1492fdbdcf2600 or maybe http://myserfer.com/?a1492800fdbdc600f2 or maybe even http://myserver.com/?800a1492fdbdcf2=600 depending on where I choose to embed the resolution within the link
PHP would know that the file name is a1492fdbdcf2.jpg, grab it, use the imagecopyresampled to copy to the resolution you want, and output it.
Theory
In theory we need a short input character set and a large output character set.
I will demonstrate it by the following example. We have the number 2468 as integer with 10 characters (0-9) as character set. We can convert it to the same number with base 2 (binary number system). Then we have a shorter character set (0 and 1) and the result is longer:
100110100100
But if we convert to hexadecimal number (base 16) with a character set of 16 (0-9 and A-F). Then we get a shorter result:
9A4
Practice
So in your case we have the following character set for the input:
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
In total 41 characters: Numbers, lower cases and the special chars = / - . &
The character set for output is a bit tricky. We want use URL save characters only. I've grabbed them from here: Characters allowed in GET parameter
So our output character set is (73 characters):
$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
Numbers, lower and upper cases and some special characters.
We have more characters in our set for the output than for the input. Theory says we can short our input string. Check!
Coding
Now we need an encode function from base 41 to base 73. For that case I don't know a PHP function. Luckily we can grab the function 'convBase' from here: Convert an arbitrarily large number from any base to any base
<?php
function convBase($numberInput, $fromBaseInput, $toBaseInput)
{
if ($fromBaseInput == $toBaseInput) return $numberInput;
$fromBase = str_split($fromBaseInput, 1);
$toBase = str_split($toBaseInput, 1);
$number = str_split($numberInput, 1);
$fromLen = strlen($fromBaseInput);
$toLen = strlen($toBaseInput);
$numberLen = strlen($numberInput);
$retval = '';
if ($toBaseInput == '0123456789')
{
$retval = 0;
for ($i = 1;$i <= $numberLen; $i++)
$retval = bcadd($retval, bcmul(array_search($number[$i-1], $fromBase), bcpow($fromLen, $numberLen-$i)));
return $retval;
}
if ($fromBaseInput != '0123456789')
$base10 = convBase($numberInput, $fromBaseInput, '0123456789');
else
$base10 = $numberInput;
if ($base10<strlen($toBaseInput))
return $toBase[$base10];
while($base10 != '0')
{
$retval = $toBase[bcmod($base10,$toLen)] . $retval;
$base10 = bcdiv($base10, $toLen, 0);
}
return $retval;
}
Now we can shorten the URL. The final code is:
$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
$encoded = convBase($input, $inputCharacterSet, $outputCharacterSet);
var_dump($encoded); // string(34) "BhnuhSTc7LGZv.h((Y.tG_IXIh8AR.$!t*"
$decoded = convBase($encoded, $outputCharacterSet, $inputCharacterSet);
var_dump($decoded); // string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
The encoded string has only 34 characters.
Optimizations
You can optimize the count of characters by
reduce the length of input string. Do you really need the overhead of URL parameter syntax? Maybe you can format your string as follows:
$input = '/dir/dir/hi-res-img.jpg,700,500';
This reduces the input itself and the input character set. Your reduced input character set is then:
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz/-.,";
Final output:
string(27) "E$AO.Y_JVIWMQ9BB_Xb3!Th*-Ut"
string(31) "/dir/dir/hi-res-img.jpg,700,500"
reducing the input character set ;-). Maybe you can exclude some more characters?
You can encode the numbers to characters first. Then your input character set can be reduced by 10!
increase your output character set. So the given set by me is googled within two minutes. Maybe you can use more URL save characters.
Security
Heads up: There is no cryptographically logic in the code. So if somebody guesses the character sets, he/she can decode the string easily. But you can shuffle the character sets (once). Then it is a bit harder for the attacker, but not really safe. Maybe it’s enough for your use case anyway.
Reading from the previous answers and below comments, you need a solution to hide the real path of your image parser, giving it a fixed image width.
Step 1: http://www.example.com/tn/full/animals/images/lion.jpg
You can achieve a basic "thumbnailer" by taking profit of .htaccess
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule tn/(full|small)/(.*) index.php?size=$1&img=$2 [QSA,L]
Your PHP file:
$basedir = "/public/content/";
$filename = realpath($basedir.$_GET["img"]);
## Check that file is in $basedir
if ((!strncmp($filename, $basedir, strlen($basedir))
||(!file_exists($filename)) die("Bad file path");
switch ($_GET["size"]) {
case "full":
$width = 700;
$height = 500;
## You can also use getimagesize() to test if the image is landscape or portrait
break;
default:
$width = 350;
$height = 250;
break;
}
## Here is your old code for resizing images.
## Note that the "tn" directory can exist and store the actual reduced images
This lets you using the URL www.example.com/tn/full/animals/images/lion.jpg to view your reduced in size image.
This has the advantage for SEO to preserve the original file name.
Step 2: http://www.example.com/tn/full/lion.jpg
If you want a shorter URL, if the number of images you have is not too much, you can use the basename of the file (e.g., "lion.jpg") and recursively search. When there is a collision, use an index to identify which one you want (e.g., "1--lion.jpg")
function matching_files($filename, $base) {
$directory_iterator = new RecursiveDirectoryIterator($base);
$iterator = new RecursiveIteratorIterator($directory_iterator);
$regex_iterator = new RegexIterator($iterator, "#$filename\$#");
$regex_iterator->setFlags(RegexIterator::USE_KEY);
return array_map(create_function('$a', 'return $a->getpathName();'), iterator_to_array($regex_iterator, false));
}
function encode_name($filename) {
$files = matching_files(basename($filename), realpath('public/content'));
$tot = count($files);
if (!$tot)
return NULL;
if ($tot == 1)
return $filename;
return "/tn/full/" . array_search(realpath($filename), $files) . "--" . basename($filename);
}
function decode_name($filename) {
$i = 0;
if (preg_match("#^([0-9]+)--(.*)#", $filename, $out)) {
$i = $out[1];
$filename = $out[2];
}
$files = matching_files($filename, realpath('public/content'));
return $files ? $files[$i] : NULL;
}
echo $name = encode_name("gallery/animals/images/lion.jp‌​g").PHP_EOL;
## --> returns lion.jpg
## You can use with the above solution the URL http://www.example.com/tn/lion.jpg
echo decode_name(basename($name)).PHP_EOL;
## -> returns the full path on disk to the image "lion.jpg"
Original post:
Basically, if you add some formatting in your example, your shortened URL is in fact longer:
img=/dir/dir/hi-res-img.jpg&w=700&h=500 // 39 characters
y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA // 50 characters
Using base64_encode will always result in longer strings. And gzcompress will require at less to store one occurrence of the different chars; this is not a good solution for small strings.
So doing nothing (or a simple str_rot13) is clearly the first option to consider if you want to shorten the result you had previously.
You can also use a simple character replacement method of your choice:
$raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$from = "0123456789abcdefghijklmnopqrstuvwxyz&=/ABCDEFGHIJKLMNOPQRSTUVWXYZ";
// The following line if the result of str_shuffle($from)
$to = "0IQFwAKU1JT8BM5npNEdi/DvZmXuflPVYChyrL4R7xc&SoG3Hq6ks=e9jW2abtOzg";
echo strtr($raw_query_string, $from, $to) . "\n";
// Result: EDpL4MEu4MEu4NE-u5f-EDp.dmprYLU00rNLA00 // 39 characters
Reading from your comment, you really want "to prevent anyone to gets a high-resolution image".
The best way to achieve that is to generate a checksum with a private key.
Encode:
$secret = "ujoo4Dae";
$raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$encoded_query_string = $raw_query_string . "&k=" . hash("crc32", $raw_query_string . $secret);
Result: img=/dir/dir/hi-res-img.jpg&w=700&h=500&k=2ae31804
Decode:
if (preg_match("#(.*)&k=([^=]*)$#", $encoded_query_string, $out)
&& (hash("crc32", $out[1].$secret) == $out[2])) {
$decoded_query_string = $out[1];
}
This does not hide the original path, but this path has no reason to be public. Your "index.php" can output your image from the local directory once the key has been checked.
If you really want to shorten your original URL, you have to consider the acceptable characters in the original URL to be restricted. Many compression methods are based on the fact that you can use a full byte to store more than a character.
There are many ways to shorten URLs. You can look up how other services, like TinyURL, shorten their URLs. Here is a good article on hashes and shortening URLs: URL Shortening: Hashes In Practice
You can use the PHP function mhash() to apply hashes to strings.
And if you scroll down to "Available Hashes" on the mhash website, you can see what hashes you can use in the function (although I would check what PHP versions have which functions): mhash - Hash Library
I think this would be better done by not obscuring at all. You could quite simply cache returned images and use a handler to provide them. This requires the image sizes to be hard coded into the PHP script. When you get new sizes, you can just delete everything in the cache as it is 'lazy loaded'.
1. Get the image from the request
This could be this: /thumbnail.php?image=img.jpg&album=myalbum. It could even be made to be anything using rewrite and have a URL like: /gallery/images/myalbum/img.jpg.
2. Check to see if a temporary version does not exist
You can do this using is_file().
3. Create it if it does not exist
Use your current resizing logic to do it, but don't output the image. Save it to the temporary location.
4. Read the temporary file contents to the stream
It pretty much just outputs it.
Here is an untested code example...
<?php
// Assuming we have a request /thumbnail.php?image=img.jpg&album=myalbum
// These are temporary filenames places. You need to do this yourself on your system.
$image = $_GET['image']; // The file name
$album = $_GET['album']; // The album
$temp_folder = sys_get_temp_dir(); // Temporary directory to store images
// (this should really be a specific cache path)
$image_gallery = "images"; // Root path to the image gallery
$width = 700;
$height = 500;
$real_path = "$image_gallery/$album/$image";
$temp_path = "$temp_folder/$album/$image";
if(!is_file($temp_path))
{
// Read in the image
$contents = file_get_contents($real_path);
// Resize however you are doing it now.
$thumb_contents = resizeImage($contents, $width, $height);
// Write to the temporary file
file_put_contents($temp_path, $thumb_contents);
}
$type = 'image/jpeg';
header('Content-Type:' . $type);
header('Content-Length: ' . filesize($temp_path));
readfile($temp_path);
?>
Short words about "security"
You simply won't be able to secure your link if there is no "secret password" stored somewhere: as long as the URI carries all information to access your resource, then it will be decodable and your "custom security" (they are opposite words btw) will be broken easily.
You can still put a salt in your PHP code (like $mysalt="....long random string...") since I doubt you want an eternal security (such approach is weak because you cannot renew the $mysalt value, but in your case, a few years security sounds sufficient, since anyway, a user can buy one picture and share it elsewhere, breaking any of your security mechanism).
If you want to have a safe mechanism, use a well-known one (as a framework would carry), along with authentication and user rights management mechanism (so you can know who's looking for your image, and whether they are allowed to).
Security has a cost. If you don't want to afford its computing and storing requirements, then forget about it.
Secure by signing the URL
If you want to avoid users easy by-passing and get full resolution picture, then you may just sign the URI (but really, for safety, use something that already exist instead of that quick draft example below):
$salt = '....long random stirng...';
$params = array('img' => '...', 'h' => '...', 'w' => '...');
$p = http_build_query($params);
$check = password_hash($p, PASSWORD_BCRYPT, array('salt' => $salt, 'cost' => 1000);
$uri = http_build_query(array_merge($params, 'sig' => $check));
Decoding:
$sig = $_GET['sig'];
$params = $_GET;
unset($params['sig']);
// Same as previous
$salt = '....long random stirng...';
$p = http_build_query($params);
$check = password_hash($p, PASSWORD_BCRYPT, array('salt' => $salt, 'cost' => 1000);
if ($sig !== $check) throw new DomainException('Invalid signature');
See password_hash
Shorten smartly
"Shortening" with a generic compression algorithm is useless here because the headers will be longer than the URI, so it will almost never shorten it.
If you want to shorten it, be smart: don't give the relative path (/dir/dir) if it's always the same (or give it only if it's not the main one). Don't give the extension if it's always the same (or give it when it's not png if almost everything is in png). Don't give the height because the image carries the aspect ratio: you only need the width. Give it in x100px if you do not need a pixel-accurate width.
A lot has been said about how encoding doesn't help security, so I am just concentrating on the shortening and aesthetics.
Rather than thinking of it as a string, you could consider it as three individual components. Then if you limit your code space for each component, you can pack things together a lot smaller.
E.g.,
path - Only consisting of the 26 characters (a-z) and / - . (Variable length)
width - Integer (0 - 65k) (Fixed length, 16 bits)
height - Integer (0 - 65k) (Fixed length, 16 bits)
I'm limiting the path to only consist of a maximum 31 characters, so we can use five bit groupings.
Pack your fixed length dimensions first, and append each path character as five bits. It might also be necessary to add a special null character to fill up the end byte. Obviously you need to use the same dictionary string for encoding and decoding.
See the code below.
This shows that by limiting what you encode and how much you can encode, you can get a shorter string. You could make it even shorter by using only 12 bit dimension integers (max 2048), or even removing parts of the path if they are known such as base path or file extension (see last example).
<?php
function encodeImageAndDimensions($path, $width, $height) {
$dictionary = str_split("abcdefghijklmnopqrstuvwxyz/-."); // Maximum 31 characters, please
if ($width >= pow(2, 16)) {
throw new Exception("Width value is too high to encode with 16 bits");
}
if ($height >= pow(2, 16)) {
throw new Exception("Height value is too high to encode with 16 bits");
}
// Pack width, then height first
$packed = pack("nn", $width, $height);
$path_bits = "";
foreach (str_split($path) as $ch) {
$index = array_search($ch, $dictionary, true);
if ($index === false) {
throw new Exception("Cannot encode character outside of the allowed dictionary");
}
$index++; // Add 1 due to index 0 meaning NULL rather than a.
// Work with a bit string here rather than using complicated binary bit shift operators.
$path_bits .= str_pad(base_convert($index, 10, 2), 5, "0", STR_PAD_LEFT);
}
// Remaining space left?
$modulo = (8 - (strlen($path_bits) % 8)) %8;
if ($modulo >= 5) {
// There is space for a null character to fill up to the next byte
$path_bits .= "00000";
$modulo -= 5;
}
// Pad with zeros
$path_bits .= str_repeat("0", $modulo);
// Split in to nibbles and pack as a hex string
$path_bits = str_split($path_bits, 4);
$hex_string = implode("", array_map(function($bit_string) {
return base_convert($bit_string, 2, 16);
}, $path_bits));
$packed .= pack('H*', $hex_string);
return base64_url_encode($packed);
}
function decodeImageAndDimensions($str) {
$dictionary = str_split("abcdefghijklmnopqrstuvwxyz/-.");
$data = base64_url_decode($str);
$decoded = unpack("nwidth/nheight/H*path", $data);
$path_bit_stream = implode("", array_map(function($nibble) {
return str_pad(base_convert($nibble, 16, 2), 4, "0", STR_PAD_LEFT);
}, str_split($decoded['path'])));
$five_pieces = str_split($path_bit_stream, 5);
$real_path_indexes = array_map(function($code) {
return base_convert($code, 2, 10) - 1;
}, $five_pieces);
$real_path = "";
foreach ($real_path_indexes as $index) {
if ($index == -1) {
break;
}
$real_path .= $dictionary[$index];
}
$decoded['path'] = $real_path;
return $decoded;
}
// These do a bit of magic to get rid of the double equals sign and obfuscate a bit. It could save an extra byte.
function base64_url_encode($input) {
$trans = array('+' => '-', '/' => ':', '*' => '$', '=' => 'B', 'B' => '!');
return strtr(str_replace('==', '*', base64_encode($input)), $trans);
}
function base64_url_decode($input) {
$trans = array('-' => '+', ':' => '/', '$' => '*', 'B' => '=', '!' => 'B');
return base64_decode(str_replace('*', '==', strtr($input, $trans)));
}
// Example usage
$encoded = encodeImageAndDimensions("/dir/dir/hi-res-img.jpg", 700, 500);
var_dump($encoded); // string(27) "Arw!9NkTLZEy2hPJFnxLT9VA4A$"
$decoded = decodeImageAndDimensions($encoded);
var_dump($decoded); // array(3) { ["width"] => int(700) ["height"] => int(500) ["path"] => string(23) "/dir/dir/hi-res-img.jpg" }
$encoded = encodeImageAndDimensions("/another/example/image.png", 4500, 2500);
var_dump($encoded); // string(28) "EZQJxNhc-iCy2XAWwYXaWhOXsHHA"
$decoded = decodeImageAndDimensions($encoded);
var_dump($decoded); // array(3) { ["width"] => int(4500) ["height"] => int(2500) ["path"] => string(26) "/another/example/image.png" }
$encoded = encodeImageAndDimensions("/short/eg.png", 300, 200);
var_dump($encoded); // string(19) "ASwAyNzQ-VNlP2DjgA$"
$decoded = decodeImageAndDimensions($encoded);
var_dump($decoded); // array(3) { ["width"] => int(300) ["height"] => int(200) ["path"] => string(13) "/short/eg.png" }
$encoded = encodeImageAndDimensions("/very/very/very/very/very-hyper/long/example.png", 300, 200);
var_dump($encoded); // string(47) "ASwAyN2LLO7FlndiyzuxZZ3Yss8Rm!ZbY9x9lwFsGF7!xw$"
$decoded = decodeImageAndDimensions($encoded);
var_dump($decoded); // array(3) { ["width"] => int(300) ["height"] => int(200) ["path"] => string(48) "/very/very/very/very/very-hyper/long/example.png" }
$encoded = encodeImageAndDimensions("only-file-name", 300, 200);
var_dump($encoded); //string(19) "ASwAyHuZnhksLxwWlA$"
$decoded = decodeImageAndDimensions($encoded);
var_dump($decoded); // array(3) { ["width"] => int(300) ["height"] => int(200) ["path"] => string(14) "only-file-name" }
In your question you state that it should be pure PHP and not use a database, and there should be a possibility to decode the strings. So bending the rules a bit:
The way I am interpreting this question is that we don't care about security that much but, we do want the shortest hashes that lead back to images.
We can also take "decode possibility" with a pinch of salt by using a one way hashing algorithm.
We can store the hashes inside a JSON object, then store the data in a file, so all we have to do at the end of the day is string matching
```
class FooBarHashing {
private $hashes;
private $handle;
/**
* In producton this should be outside the web root
* to stop pesky users downloading it and geting hold of all the keys.
*/
private $file_name = './my-image-hashes.json';
public function __construct() {
$this->hashes = $this->get_hashes();
}
public function get_hashes() {
// Open or create a file.
if (! file_exists($this->file_name)) {
fopen($this->file_name, "w");
}
$this->handle = fopen($this->file_name, "r");
$hashes = [];
if (filesize($this->file_name) > 0) {
$contents = fread($this->handle, filesize($this->file_name));
$hashes = get_object_vars(json_decode($contents));
}
return $hashes;
}
public function __destroy() {
// Close the file handle
fclose($this->handle);
}
private function update() {
$handle = fopen($this->file_name, 'w');
$res = fwrite($handle, json_encode($this->hashes));
if (false === $res) {
//throw new Exception('Could not write to file');
}
return true;
}
public function add_hash($image_file_name) {
$new_hash = md5($image_file_name, false);
if (! in_array($new_hash, array_keys($this->hashes) ) ) {
$this->hashes[$new_hash] = $image_file_name;
return $this->update();
}
//throw new Exception('File already exists');
}
public function resolve_hash($hash_string='') {
if (in_array($hash_string, array_keys($this->hashes))) {
return $this->hashes[$hash_string];
}
//throw new Exception('File not found');
}
}
```
Usage example:
<?php
// Include our class
require_once('FooBarHashing.php');
$hashing = new FooBarHashing;
// You will need to add the query string you want to resolve first.
$hashing->add_hash('img=/dir/dir/hi-res-img.jpg&w=700&h=500');
// Then when the user requests the hash the query string is returned.
echo $hashing->resolve_hash('65992be720ea3b4d93cf998460737ac6');
So the end result is a string that is only 32 chars long, which is way shorter than the 52 we had before.
From the discussion in the comments section it looks like what you really want is to protect your original high-resolution images.
Having that in mind, I'd suggest to actually do that first using your web server configuration (e.g., Apache mod_authz_core or Nginx ngx_http_access_module) to deny access from the web to the directory where your original images are stored.
Note that the server will only deny access to your images from the web, but you will still be able to access them directly from your PHP scripts. Since you already are displaying images using some "resizer" script I'd suggest putting some hard limit there and refuse to resize images to anything bigger then that (e.g., something like $width = min(1000, $_GET['w'])).
I know this does not answer your original question, but I think this would the right solution to protect your images. And if you still want to obfuscate the original name and resizing parameters you can do that however you see fit without worrying that someone might figure out what’s behind it.
I'm afraid, you won't be able to shorten the query string better than any known
compression algorithm. As mentioned in other answers, a compressed
version will be shorter by a few (around 4-6) characters than the original.
Moreover, the original string can be decoded relatively easy (opposed to decoding SHA-1 or MD5, for instance).
I suggest shortening URLs by means of Web server configuration. You might
shorten it further by replacing image path with an ID (store ID-filename
pairs in a database).
For example, the following Nginx configuration accepts
URLs like /t/123456/700/500/4fc286f1a6a9ac4862bdd39a94a80858, where
the first number (123456) is supposed to be an image ID from database;
700 and 500 are image dimensions;
the last part is an MD5 hash protecting from requests with different dimensions.
# Adjust maximum image size
# image_filter_buffer 5M;
server {
listen 127.0.0.13:80;
server_name img-thumb.local;
access_log /var/www/img-thumb/logs/access.log;
error_log /var/www/img-thumb/logs/error.log info;
set $root "/var/www/img-thumb/public";
# /t/image_id/width/height/md5
location ~* "(*UTF8)^/t/(\d+)/(\d+)/(\d+)/([a-zA-Z0-9]{32})$" {
include fastcgi_params;
fastcgi_pass unix:/tmp/php-fpm-img-thumb.sock;
fastcgi_param QUERY_STRING image_id=$1&w=$2&h=$3&hash=$4;
fastcgi_param SCRIPT_FILENAME /var/www/img-thumb/public/t/resize.php;
image_filter resize $2 $3;
error_page 415 = /empty;
break;
}
location = /empty {
empty_gif;
}
location / { return 404; }
}
The server accepts only URLs of specified pattern, forwards request to /public/t/resize.php script with modified query string, then resizes the image generated by PHP with the image_filter module. In case of error, returns an empty GIF image.
The image_filter is optional, and it is included only as an example. Resizing can be performed fully on PHP side. With Nginx, it is possible to get rid of PHP part, by the way.
The PHP script is supposed to validate the hash as follows:
// Store this in some configuration file.
$salt = '^sYsdfc_sd&9wa.';
$w = $_GET['w'];
$h = $_GET['h'];
$true_hash = md5($w . $h . $salt . $image_id);
if ($true_hash != $_GET['hash']) {
die('invalid hash');
}
$filename = fetch_image_from_database((int)$_GET['image_id']);
$img = imagecreatefrompng($filename);
header('Content-Type: image/png');
imagepng($img);
imagedestroy($img);
I don't think the resulting URL can be shortened much more than in your own example. But I suggest a few steps to obfuscate your images better.
First I would remove everything you can from the base URL you are zipping and Base64 encoding, so instead of
img=/dir/dir/hi-res-img.jpg&w=700&h=500
I would use
s=hi-res-img.jpg,700,500,062c02153d653119
Were those last 16 chars are a hash to validate the URL being opened is the same you offered in your code - and the user is not trying to trick the high-resolution image out of the system.
Your index.php that serves the images would start like this:
function myHash($sRaw) { // returns a 16-characters dual hash
return hash('adler32', $sRaw) . strrev(hash('crc32', $sRaw));
} // These two hash algorithms are suggestions, there are more for you to chose.
// s=hi-res-img.jpg,700,500,062c02153d653119
$aParams = explode(',', $_GET['s']);
if (count($aParams) != 4) {
die('Invalid call.');
}
list($sFileName, $iWidth, $iHeight, $sHash) = $aParams;
$sRaw = session_id() . $sFileName . $iWidth . $iHeight;
if ($sHash != myHash($sRaw)) {
die('Invalid hash.');
}
After this point you can send the image as the user opening it had access to a valid link.
Note the use of session_id as part of the raw string that makes the hash is optional, but would make it impossible for users to share a valid URL - as it would be session bind. If you want the URLs to be shareable, then just remove session_id from that call.
I would wrap the resulting URL the same way you already do, zip + Base64. The result would be even bigger than your version, but more difficult to see through the obfuscation, and therefore protecting your images from unauthorised downloads.
If you want only to make it shorter, I do not see a way of doing it without renaming the files (or their folders), or without the use of a database.
The file database solution proposed will surely create problems of concurrency - unless you always have no or very few people using the system simultaneously.
You say that you want the size there, so that if you decide some day that the preview images are too small, you want to increase the size - the solution here is to hard code the image size into the PHP script and eliminate it from the URL.
If you want to change the size in the future, change the hardcoded values in the PHP script (or in a config.php file that you include into the script).
You've also said that you are already using files to store image data as a JSON object, like: name, title, description. Exploiting this, you don't need a database and can use the JSON file name as the key for looking up the image data.
When the user visits a URL like this:
www.mysite.com/share/index.php?ax9v
You load ax9v.json from the location you are already storing the JSON files, and within that JSON file the image's real path is stored. Then load the image, resize it according to the hardcoded size in your script and send it to the user.
Drawing from the conclusions in
URL Shortening: Hashes In Practice, to get the smallest search string part of the URL you would need to iterate valid character combinations as new files are uploaded (e.g., the first one is "AAA" then "AAB", "AAC", etc.) instead of using a hashing algorithm.
Your solution would then have only three characters in the string for the first 238,328 photos you upload.
I had started to prototype a PHP solution on PhpFiddle, but the code disappeared (don't use PhpFiddle).

PHP to delete lines within text file beginning with 0 or a negative number

Thank you for taking the time to read this and I will appreciate every single response no mater the quality of content. :)
Using php, I'm trying to create a script which will delete several lines within a text file (.txt) if required, based upon whether the line starts with a 0 or a negative number. Each line within the file will always start with a number, and I need to erase all the neutral and/or negative numbers.
The main part I'm struggling with is that the content within the text file isn't static (e.g. contain x number of lines/words etc.) Infact, it is automatically updated every 5 minutes with several lines. Therefore, I'd like all the lines containing a neutral or negative number to be removed.
The text file follows the structure:
-29 aullah1
0 name
4 username
4 user
6 player
If possible, I'd like Line 1 and 2 removed, since it begins with a neutral/negative number. At points, there maybe times when there are more than two neutral/negative numbers.
All assistance is appreciated and I look forward to your replies; thank you. :) If I didn't explain anything clearly and/or you'd like me to explain in more detail, please reply. :)
Thank you.
Example:
$file = file("mytextfile.txt");
$newLines = array();
foreach ($file as $line)
if (preg_match("/^(-\d+|0)/", $line) === 0)
$newLines[] = chop($line);
$newFile = implode("\n", $newLines);
file_put_contents("mytextfile.txt", $newFile);
It is important that you chop() the newline character off of the end of the line so you don't end up with empty space. Tested successfully.
Something on these lines i guess, it is untested.
$newContent = "";
$lines = explode("\n" , $content);
foreach($lines as $line){
$fChar = substr($line , 0 , 1);
if($fChar == "0" || $fChar == "-") continue;
else $newContent .= $line."\n";
}
If the file is big, its better to read it line by line as:
$fh_r = fopen("input.txt", "r"); // open file to read.
$fh_w = fopen("output.txt", "w"); // open file to write.
while (!feof($fh_r)) { // loop till lines are left in the input file.
$buffer = fgets($fh_r); // read input file line by line.
// if line begins with num other than 0 or -ve num write it.
if(!preg_match('/^(0|-\d+)\b/',$buffer)) {
fwrite($fh_w,$buffer);
}
}
fclose($fh_r);
fclose($fh_w);
Note: Err checking not included.
file_put_contents($newfile,
implode(
preg_grep('~^[1-9]~',
file($oldfile))));
php is not particularly elegant, but still...
Load whole line into variable trim it and then check if first letter is - or 0.
$newContent = "";
$lines = explode("\n" , $content);
foreach($lines as $line){
$fChar = $line[0];
if(!($fChar == '0' || $fChar == '-'))
$newContent .= $line."\n";
}
I changed malik's code for better performance and quality.
Here's another way:
class FileCleaner extends FilterIterator
{
public function __construct($srcFile)
{
parent::__construct(new ArrayIterator(file($srcFile)));
}
public function accept()
{
list($num) = explode(' ', parent::current(), 2);
return ($num > 0);
}
public function write($file)
{
file_put_contents($file, implode('', iterator_to_array($this)));
}
}
Usage:
$filtered = new FileCleaner($src_file);
$filtered->write($new_file);
Logic and methods can be added to the class for other stuff, such as sorting, finding the highest number, converting to a sane storage method such as csv, etc. And, of course, error checking.

How to validate domain name in PHP?

Is it possible without using regular expression?
For example, I want to check that a string is a valid domain:
domain-name
abcd
example
Are valid domains. These are invalid of course:
domaia#name
ab$%cd
And so on. So basically it should start with an alphanumeric character, then there may be more alnum characters plus also a hyphen. And it must end with an alnum character, too.
If it's not possible, could you suggest me a regexp pattern to do this?
EDIT:
Why doesn't this work? Am I using preg_match incorrectly?
$domain = '#djkal';
$regexp = '/^[a-zA-Z0-9][a-zA-Z0-9\-\_]+[a-zA-Z0-9]$/';
if (false === preg_match($regexp, $domain)) {
throw new Exception('Domain invalid');
}
<?php
function is_valid_domain_name($domain_name)
{
return (preg_match("/^([a-z\d](-*[a-z\d])*)(\.([a-z\d](-*[a-z\d])*))*$/i", $domain_name) //valid chars check
&& preg_match("/^.{1,253}$/", $domain_name) //overall length check
&& preg_match("/^[^\.]{1,63}(\.[^\.]{1,63})*$/", $domain_name) ); //length of each label
}
?>
Test cases:
is_valid_domain_name? [a] Y
is_valid_domain_name? [0] Y
is_valid_domain_name? [a.b] Y
is_valid_domain_name? [localhost] Y
is_valid_domain_name? [google.com] Y
is_valid_domain_name? [news.google.co.uk] Y
is_valid_domain_name? [xn--fsqu00a.xn--0zwm56d] Y
is_valid_domain_name? [goo gle.com] N
is_valid_domain_name? [google..com] N
is_valid_domain_name? [google.com ] N
is_valid_domain_name? [google-.com] N
is_valid_domain_name? [.google.com] N
is_valid_domain_name? [<script] N
is_valid_domain_name? [alert(] N
is_valid_domain_name? [.] N
is_valid_domain_name? [..] N
is_valid_domain_name? [ ] N
is_valid_domain_name? [-] N
is_valid_domain_name? [] N
With this you will not only be checking if the domain has a valid format, but also if it is active / has an IP address assigned to it.
$domain = "stackoverflow.com";
if(filter_var(gethostbyname($domain), FILTER_VALIDATE_IP))
{
return TRUE;
}
Note that this method requires the DNS entries to be active so if you require a domain string to be validated without being in the DNS use the regular expression method given by velcrow above.
Also this function is not intended to validate a URL string use FILTER_VALIDATE_URL for that. We do not use FILTER_VALIDATE_URL for a domain because a domain string is not a valid URL.
PHP 7
// Validate a domain name
var_dump(filter_var('mandrill._domainkey.mailchimp.com', FILTER_VALIDATE_DOMAIN));
# string(33) "mandrill._domainkey.mailchimp.com"
// Validate an hostname (here, the underscore is invalid)
var_dump(filter_var('mandrill._domainkey.mailchimp.com', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME));
# bool(false)
It is not documented here: http://www.php.net/filter.filters.validate and a bug request for this is located here: https://bugs.php.net/bug.php?id=72013
use checkdnsrr http://php.net/manual/en/function.checkdnsrr.php
$domain = "stackoverflow.com";
checkdnsrr($domain , "A");
//returns true if has a dns A record, false otherwise
Firstly, you should clarify whether you mean:
individual domain name labels
entire domain names (i.e. multiple dot-separate labels)
host names
The reason the distinction is necessary is that a label can technically include any characters, including the NUL, # and '.' characters. DNS is 8-bit capable and it's perfectly possible to have a zone file containing an entry reading "an\0odd\.l#bel". It's not recommended of course, not least because people would have difficulty telling a dot inside a label from those separating labels, but it is legal.
However, URLs require a host name in them, and those are governed by RFCs 952 and 1123. Valid host names are a subset of domain names. Specifically only letters, digits and hyphen are allowed. Furthermore the first and last characters cannot be a hyphen. RFC 952 didn't permit a number for the first character, but RFC 1123 subsequently relaxed that.
Hence:
a - valid
0 - valid
a- - invalid
a-b - valid
xn--dasdkhfsd - valid (punycode encoding of an IDN)
Off the top of my head I don't think it's possible to invalidate the a- example with a single simple regexp. The best I can come up with to check a single host label is:
if (preg_match('/^[a-z\d][a-z\d-]{0,62}$/i', $label) &&
!preg_match('/-$/', $label))
{
# label is legal within a hostname
}
To further complicate matters, some domain name entries (typically SRV records) use labels prefixed with an underscore, e.g. _sip._udp.example.com. These are not host names, but are legal domain names.
Here is another way without regex.
$myUrl = "http://www.domain.com/link.php";
$myParsedURL = parse_url($myUrl);
$myDomainName= $myParsedURL['host'];
$ipAddress = gethostbyname($myDomainName);
if($ipAddress == $myDomainName)
{
echo "There is no url";
}
else
{
echo "url found";
}
I think once you have isolated the domain name, say, using Erklan's idea:
$myUrl = "http://www.domain.com/link.php";
$myParsedURL = parse_url($myUrl);
$myDomainName= $myParsedURL['host'];
you could use :
if( false === filter_var( $myDomainName, FILTER_VALIDATE_URL ) ) {
// failed test
}
PHP5s Filter functions are for just such a purpose I would have thought.
It does not strictly answer your question as it does not use Regex, I realise.
Regular expression is the most effective way of checking for a domain validation. If you're dead set on not using a Regular Expression (which IMO is stupid), then you could split each part of a domain:
www. / sub-domain
domain name
.extension
You would then have to check each character in some sort of a loop to see that it matches a valid domain.
Like I said, it's much more effective to use a regular expression.
Your regular expression is fine, but you're not using preg_match right. It returns an int (0 or 1), not a boolean. Just write if(!preg_match($regex, $string)) { ... }
If you don't want to use regular expressions, you can try this:
$str = 'domain-name';
if (ctype_alnum(str_replace('-', '', $str)) && $str[0] != '-' && $str[strlen($str) - 1] != '-') {
echo "Valid domain\n";
} else {
echo "Invalid domain\n";
}
but as said regexp are the best tool for this.
If you want to check whether a particular domain name or ip address exists or not, you can also use checkdnsrr
Here is the doc http://php.net/manual/en/function.checkdnsrr.php
A valid domain is for me something I'm able to register or at least something that looks like I could register it. This is the reason why I like to separate this from "localhost"-names.
And finally I was interested in the main question if avoiding Regex would be faster and this is my result:
<?php
function filter_hostname($name, $domain_only=false) {
// entire hostname has a maximum of 253 ASCII characters
if (!($len = strlen($name)) || $len > 253
// .example.org and localhost- are not allowed
|| $name[0] == '.' || $name[0] == '-' || $name[ $len - 1 ] == '.' || $name[ $len - 1 ] == '-'
// a.de is the shortest possible domain name and needs one dot
|| ($domain_only && ($len < 4 || strpos($name, '.') === false))
// several combinations are not allowed
|| strpos($name, '..') !== false
|| strpos($name, '.-') !== false
|| strpos($name, '-.') !== false
// only letters, numbers, dot and hypen are allowed
/*
// a little bit slower
|| !ctype_alnum(str_replace(array('-', '.'), '', $name))
*/
|| preg_match('/[^a-z\d.-]/i', $name)
) {
return false;
}
// each label may contain up to 63 characters
$offset = 0;
while (($pos = strpos($name, '.', $offset)) !== false) {
if ($pos - $offset > 63) {
return false;
}
$offset = $pos + 1;
}
return $name;
}
?>
Benchmark results compared with velcrow 's function and 10000 iterations (complete results contains many code variants. It was interesting to find the fastest.):
filter_hostname($domain);// $domains: 0.43556308746338 $real_world: 0.33749794960022
is_valid_domain_name($domain);// $domains: 0.81832790374756 $real_world: 0.32248711585999
$real_world did not contain extreme long domain names to produce better results. And now I can answer your question: With the usage of ctype_alnum() it would be possible to realize it without regex, but as preg_match() was faster I would prefer that.
If you don't like the fact that "local.host" is a valid domain name use this function instead that valids against a public tld list. Maybe someone finds the time to combine both.
The correct answer is that you don't ... you let a unit tested tool do the work for you:
// return '' if host invalid --
private function setHostname($host = '')
{
$ret = (!empty($host)) ? $host : '';
if(filter_var('http://'.$ret.'/', FILTER_VALIDATE_URL) === false) {
$ret = '';
}
return $ret;
}
further reading :https://www.w3schools.com/php/filter_validate_url.asp
If you can run shell commands, following is the best way to determine if a domain is registered.
This function returns false, if domain name isn't registered else returns domain name.
function get_domain_name($domain) {
//Step 1 - Return false if any shell sensitive chars or space/tab were found
if(escapeshellcmd($domain)!=$domain || count(explode(".", $domain))<2 || preg_match("/[\s\t]/", $domain)) {
return false;
}
//Step 2 - Get the root domain in-case of subdomain
$domain = (count(explode(".", $domain))>2 ? strtolower(explode(".", $domain)[count(explode(".", $domain))-2].".".explode(".", $domain)[count(explode(".", $domain))-1]) : strtolower($domain));
//Step 3 - Run shell command 'dig' to get SOA servers for the domain extension
$ns = shell_exec(escapeshellcmd("dig +short SOA ".escapeshellarg(explode(".", $domain)[count(explode(".", $domain))-1])));
//Step 4 - Return false if invalid extension (returns NULL), or take the first server address out of output
if($ns===NULL) {
return false;
}
$ns = (((preg_split('/\s+/', $ns)[0])[strlen(preg_split('/\s+/', $ns)[0])-1]==".") ? substr(preg_split('/\s+/', $ns)[0], 0, strlen(preg_split('/\s+/', $ns)[0])-1) : preg_split('/\s+/', $ns)[0]);
//Step 5 - Run another dig using the obtained address for our domain, and return false if returned NULL else return the domain name. This assumes an authoritative NS is assigned when a domain is registered, can be improved to filter more accurately.
$ans = shell_exec(escapeshellcmd("dig +noall +authority ".escapeshellarg("#".$ns)." ".escapeshellarg($domain)));
return (($ans===NULL) ? false : ((strpos($ans, $ns)>-1) ? false : $domain));
}
Pros
Works on any domain, while php dns functions may fail on some domains. (my .pro domain failed on php dns)
Works on fresh domains without any dns (like A) records
Unicode friendly
Cons
Usage of shell execution, probably
<?php
if(is_valid_domain('https://www.google.com')==1){
echo 'Valid';
}else{
echo 'InValid';
}
function is_valid_domain($url){
$validation = FALSE;
/*Parse URL*/
$urlparts = parse_url(filter_var($url, FILTER_SANITIZE_URL));
/*Check host exist else path assign to host*/
if(!isset($urlparts['host'])){
$urlparts['host'] = $urlparts['path'];
}
if($urlparts['host']!=''){
/*Add scheme if not found*/ if (!isset($urlparts['scheme'])){
$urlparts['scheme'] = 'http';
}
/*Validation*/
if(checkdnsrr($urlparts['host'], 'A') && in_array($urlparts['scheme'],array('http','https')) && ip2long($urlparts['host']) === FALSE){
$urlparts['host'] = preg_replace('/^www\./', '', $urlparts['host']);
$url = $urlparts['scheme'].'://'.$urlparts['host']. "/";
if (filter_var($url, FILTER_VALIDATE_URL) !== false && #get_headers($url)) {
$validation = TRUE;
}
}
}
return $validation;
}
?>
After reading all the issues with the added functions I decided I need something more accurate.
Here's what I came up with that works for me.
If you need to specifically validate hostnames (they must start and end with an alphanumberic character and contain only alphanumerics and hyphens) this function should be enough.
function is_valid_domain($domain) {
// Check for starting and ending hyphen(s)
if(preg_match('/-./', $domain) || substr($domain, 1) == '-') {
return false;
}
// Detect and convert international UTF-8 domain names to IDNA ASCII form
if(mb_detect_encoding($domain) != "ASCII") {
$idn_dom = idn_to_ascii($domain);
} else {
$idn_dom = $domain;
}
// Validate
if(filter_var($idn_dom, FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME) != false) {
return true;
}
return false;
}
Note that this function will work on most (haven't tested all languages) LTR languages. It will not work on RTL languages.
is_valid_domain('a'); Y
is_valid_domain('a.b'); Y
is_valid_domain('localhost'); Y
is_valid_domain('google.com'); Y
is_valid_domain('news.google.co.uk'); Y
is_valid_domain('xn--fsqu00a.xn--0zwm56d'); Y
is_valid_domain('area51.com'); Y
is_valid_domain('japanese.コム'); Y
is_valid_domain('домейн.бг'); Y
is_valid_domain('goo gle.com'); N
is_valid_domain('google..com'); N
is_valid_domain('google-.com'); N
is_valid_domain('.google.com'); N
is_valid_domain('<script'); N
is_valid_domain('alert('); N
is_valid_domain('.'); N
is_valid_domain('..'); N
is_valid_domain(' '); N
is_valid_domain('-'); N
is_valid_domain(''); N
is_valid_domain('-günter-.de'); N
is_valid_domain('-günter.de'); N
is_valid_domain('günter-.de'); N
is_valid_domain('sadyasgduysgduysdgyuasdgusydgsyudgsuydgusydgsyudgsuydusdsdsdsaad.com'); N
is_valid_domain('2001:db8::7'); N
is_valid_domain('876-555-4321'); N
is_valid_domain('1-876-555-4321'); N
I know that this is an old question, but it was the first answer on a Google search, so it seems relevant. I recently had this same problem. The solution in my case was to just use the Public Suffix List:
https://publicsuffix.org/learn/
The suggested language specific libraries listed should all allow for easy validation of not just domain format, but also top level domain validity.
Check the php function checkdnsrr
function validate_email($email){
$exp = "^[a-z\'0-9]+([._-][a-z\'0-9]+)*#([a-z0-9]+([._-][a-z0-9]+))+$";
if(eregi($exp,$email)){
if(checkdnsrr(array_pop(explode("#",$email)),"MX")){
return true;
}else{
return false;
}
}else{
return false;
}
}
This is validation of domain name in javascript:
<script>
function frmValidate() {
var val=document.frmDomin.name.value;
if (/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$/.test(val)){
alert("Valid Domain Name");
return true;
} else {
alert("Enter Valid Domain Name");
val.name.focus();
return false;
}
}
</script>
This is simple. Some php egnine has a problem with split().
This code below will work.
<?php
$email = "vladimiroliva#ymail.com";
$domain = strtok($email, "#");
$domain = strtok("#");
if (#getmxrr($domain,$mxrecords))
echo "This ". $domain." EXIST!";
else
echo "This ". $domain." does not exist!";
?>

Categories