creating own audio captcha in php - php

I wanted to make a little audio captcha in php, so I needed to convert text to speech, but I have two restrictions:
First it should be a php-solution. creating a mp3/ogg would be fine, it could be inserted and played with audio-tags etc.
Second I need to install it on a server only using ftp-access. So, I can't use standard applications to which php would speak.
So, I already investigated some solutions:
Jquery's Jtalk can read text aloud, but it's kind of impractical here as javascripts is always open source => the captcha would be plain in the source-Code.
Google has an Api to speak aloud, too. However, you need to make a call to an extern file with the text as part of the url. so, listening to the outgoing requests will reveil the captcha, too.
I tried to combine my own audio-files using php. I have read in some posts here, that many player supports simply a echo file_get_contents['audio1.ogg'].file_get_contents['audio2.ogg']; solution. However, using the plugin in Firefox, only the first file is played. Downloading and playing in VLC reveals both audio files. I'm also not really happy with this one, even if it would work, as one could just associate the ogg-source with the letter and recognise the captcha by slicing the audio-source-code...
I also thought of loading all letters in audio-tags and playing them as needed, but that will again reveal the captcha in the web's source code.
Lastly I heard of "flite" which promised to be able to do all these things, but I think I got a little mistaken and it needs to get installed directly on the server rather than just putting a few files on an ftp.
So, does anybody know how to make a text to speech solution with only ftp-access and without contacting other websites with the text as part of the url?
Regards,
Julian

So, I have made up a solution combining javascript and php which is pleasing for my taste and could get modified for additional security (like adding noise or having something else than a letter per sound file).
It works like this: you set up a sounds-folder, protected per htaccess, only allowing a captcha.php-script to get files. There is one file per letter you want to display.
The script can also access the captcha via Session, database or protected file and has a pointer to the position that is currently read. Every time it is visited, it gives the audio of the next letter back. This could get done by e.g.
echo file_get_contents('sounds/'.$_SESSION["curaudio"].'.ogg');
Then you only need to insert the audio-element into your html:
<audio hidden id="Sound_captcha">
Your browser does not support the audio element.
</audio>
And Use javascript to switch to the next letter. For that, use the src-attribute of the audio and give the address of your captcha.php-file. Remember to add a value to prevent Cache:
"captcha.php?"+(new Date()).getTime()
You can call the play()-function of the audio-element to play the file.
To switch to the next requires to either stay at a fixed amount of time per file (very insecure) or to use the ended-event of the audio-element.
Of course, your php-script should at the end also tell when the captcha has been read completely (e.g. to be read with another script where you need a an ajax-request or e.g. the script that produces the sound does it only at every odd access, otherwise status, or the script tells you at the beginning how many reloads you need...)
That is actually all for a basic player, which would also need to get modified to prevent an easy bot-access... however, in my opinion, this is at least as secure as a standard text-captcha and removes a great barrier for people with eye-problems.

Related

page has two php files with identical names how to wget the correct php file? adding identifier?

I have this problem when I'm trying to use wget to retrieve the OUTPUT of a specific php script, but it looks like this site generates 2 identical PHP files.
The 1st one is smaller and the 2nd one, in the sequence, is the correct one. The problem is every time I try the wget command, I end-up with the smallest output file, which does not contain the desired info :(
Is there a way to download the correct file, using wget, by adding some sort of identifier to the link, to make sure I'm downloading the correct file.
Here is the command I've been trying:
$ wget http://www.fernsehen.to/index.php
If your run/play this and use Fidller or Wireshark for capture, you'll end-up with two (2) "http://www.fernsehen.to/index.php" and I need the bigger file of the two.
P.S. To manually get the desired output file, you can open http://www.fernsehen.to/index.php in Firefox or chrome and view source.
Thank you in advance!
What you want is not really practically possible. When you visit that page, they first generate a small file with a load of Javascript, that detects browser features and sends them back to the server in a stateful manner in order to produce the exact code required for your browser, probably including stuff like supported codecs for video mainly. Probably they also do some session fingerprinting for DRM purposes, to stop people like you from exactly what you're trying to do.
wget cannot emulate this behaviour because it is not a full browser, and cannot execute all that Javascript, nor if it did properly supply browser-like data. You'd have to write an extensive piece of custom code that exactly mimics everything the in-between page is doing to achieve the intended effect. Possible, but not easy, and most certainly not with a basic generic-purpose tool like wget.

Identify a file that contains a particular string in PHP/SQL site

By using the inspect element feature of Chrome I have identified a string of text that needs to be altered to lower case.
Though the string appears on all the pages in the site, I am not sure which file to edit.
The website is a CMS based on PHP and SQL - I am not so familiar with these programs.
I have searched through the files manually and cannot find the string.
Is there a way to search and identify the file I need using, for example, the inspect element feature on browsers or in FTP tool such as Filezilla?
Check if you have a layout page of any kind in your CMS. If you do, then most probably either in that file, or in the footer include file you will find either the javascript for google analytics, or a js include file for the same.
Try doing a site search for 'UA-34035531-1' (which is your google analytics user key) and see if it returns anything. If you find it, what you need would be two lines under it.
Usually people do not put analytics code in DB, so there is a bigger chance you will find it in one of the files, which most probably is included/embedded in a layout file of some sort, as you need it across all pages in the site

Extract information from javascript counter via PHP

I'm looking for a way to extract some information from this site via PHP:
http://www.mycitydeal.co.uk/deals/london
There ist a counter where the time left is displayed, but the information is within the JavaScript. Since I'm really a JavaScript rookie, I didn't really know how to get the information.
Normally I would extract the information with "preg_match" and some regular expressions. Can someone help me to extract the information (Hrs., Min., Sec.) ?
Jennifer
Extracting the count-down time is not going to be easy, because it is fetched and set purely using JavaScript, which cannot be parsed using pure PHP. You would have to de-code the JavaScript code and see what calls it makes to fetch the initial times.
That is not an easy process, and could be changed by the site owners in no time.
Also, doing that, you would be in clear breach of their T&C:
For the avoidance of doubt, scraping of the Website (and hacking of the Website) is not allowed.
I hate to say "no", but in this situation PHP is not the right job for this. JavaScript requires a browser to run (in this case) and on top of that you probably have a jQuery lib.
The only thing PHP could do is invoke a browser that would contain some JavaScript (i.e., GreaseMonkey) that could try and scrape the page for the info. But this is really a job for embedded JavaScript.
As the others have said you can usually not access JavaScript stuff from PHP. However JavaScript has to get its data from somewhere, and this is where to start.
I found this in the source code:
<input type="hidden" id="currentTimeLeft" value="3749960"/>
That's the number of microsecond until whatever it is.
However this was only present in firefox, not when fetching it with wget. I found out it's the cookie that matters, so you'd have to request the page once, store the cookies and then access it a second time.

How to disable or encrypt "View Source" for my site

Is there any way to disable or encrypt "View Source" for my site so that I can secure my code?
Fero,
Your question doesn't make much sense. The "View Source" is showing the HTML source—if you encrypt that, the user (and the browser) won't be able to read your content anymore.
If you want to protect your PHP source, then there are tools like Zend Guard. It would encrypt your source code and make it hard to reverse engineer.
If you want to protect your JavaScript, you can minify it with, for example, YUI Compressor. It won't prevent the user from using your code since, like the user, the browser needs to be able to read the code somehow, but at least it would make the task more difficult.
If you are more worried about user privacy, you should use SSL to make sure the sensitive information is encrypted when on the wire.
Finally, it is technically possible to encrypt the content of a page and use JavaScript to decrypt it, but since this relies on JavaScript, an experienced user could defeat this in a couple of minutes. Plus all these problems would appear:
Search engines won't be able to index your pages...
Users with JavaScript disabled would see the encrypted page
It could perform really poorly depending the amount of content you have
So I don't advise you to use this solution.
You can't really disable that because eventually the browser will still need to read and parse the source in order to output.
If there is something SO important in your source code, I recommend you hide it on server side.
Even if you encrypt or obfuscate your HTML source, eventually we still can eval and view it. Using Firebug for instance, we can see source code no matter what.
If you are selling PHP software, you can consider Software as a Service (SaaS).
So you want to encrypt your HTML source. You can encrypt it using some javascript tool, but beware that if the user is smart enough, he will always be able to decrypt it doing the same thing that the browser should do: run the javascript and see the generated HTML.
EDIT: See this HTML scrambler as an example on how to encrypt it:
http://www.voormedia.com/en/tools/html-obfuscate-scrambler.php
EDIT2: And .. see this one for how to decrypt it :)
http://www.gooby.ca/decrypt/
Short answer is not, html is an open text format what ever you do if the page renders people will be able to see your source code. You can use javascript to disable the right click which will work on some browsers but any one wanting to use your code will know how to avoid this. You can also have javascrpit emit the html after storing this encoded, this will have bad impacts on development, accessibility, and speed of load. After all that any one with firebug installed will still be able to see you html code.
There is also very really a lot of value in your html, your real ip is in your server code which stays safe and sound on your server.
This is fundamentally impossible. As (almost) everybody has said, the web browser of your user needs to be able to read your html and Javascript, and browsers exist to serve their users -- not you.
What this means is that no matter what you do there is eventually going to be something on a user's machine that looks like:
<html>
<body>
<div id="my secret page layout trick"> ...
</div>
</body>
</html>
because otherwise there is nothing to show the user. If that exists on the client-side, then you have lost control of it. Even if you managed to convince every browser-maker on the planet to not make that available through a "view source" option -- which is, you know, unlikely -- the text will still exist on that user's machine, and somebody will figure out how to get to it. And that will never happen, browsers will always exist to serve their users before all others. (Hopefully)
The same thing is true for all of your Javascript. Let me say it again: nothing that you send to a user is secure or secret from that user. The encryption via Javascript hack is stupid and cannot work in any meaningful sense.
(Well, actually, Flash and Silverlight ship binaries, but I don't think that they're encrypted. So they are at the least irritating to get data out of.)
As others have said, the only way to keep something secret from your users is to not give it to them: put the logic in your server and make sure that it is never sent. For example, all of the code that you write in PHP (or Python/Ruby/Perl/Java/C...) should never be seen by your users. This is e.g. why Google still has a business. What they give you is fundamentally uninteresting compared to what they never send to you. And, because they realize this, they try to make most things that they send you as open as useful as possible. Because it's the infrastructure -- the Terrabyte-huge maps database and pathfinding software, as opposed to the snazzy map that you can click and drag -- that you are trading your privacy for.
Another example: I'm not sure if you remember how many tricks people employed in the early days of the web to try and keep people from saving images to disk. When was the last time you ran across one of those? Know why? Because once data is on your user's machine, she controls it. Not you.
So, in short: if you want to keep something secret from your user, don't give it to her.
You cant. The browser needs the source to render the page. If the user user wishes the user may have the browser show the source. Firefox can also show you the DOM of the page. You can obfuscate the source but not encrypt or lock the user out.
Also why would you want this, it seem like a lame ass thing to do :P
I don't think there is a way to do this. Because if you encrypt how the browser will understand the HTML?
No. The browsers offer no ability for the HTML/javascript to disable that feature (thankfully). Plus even if you could the HTML is still transmitted in plain text ready for a HTTP sniffer to read.
Best you could do would be to somehow obscure the HTML/javascript to make it hard to read. But then debuggers like Firebug and IE 8's debugger will reconstruct it from the DOM making it easy to read,
You can, in fact, disable the right click function. It is useless to do so, however, as most browsers now have built in inspector tools which show the source anyway. Not to mention that other workarounds (such as saving the page, then opening the source, or simply using hotkeys) exist for viewing the html source. Tutorials for disabling the right click function abound across the web, so a quick google search will point you in the right direction if you fell an overwhelming urge to waste your time.
There is no full proof way.
But You can fool many people using simple Hack using below methods:
"window.history.pushState()" and
adding oncontextmenu="return false" in body tag as attribute
Detail here - http://freelancer.usercv.com/blog/28/hide-website-source-code-in-view-source-using-stupid-one-line-chinese-hack-code
You can also use “javascript obfuscation” to further complicate things, but it won’t hide it completely.
“Inspect Element” can reveal everything beyond view-source.
Yes, you can have your whole website being rendered dynamically via javascript which would be encrypted/packed/obfuscated like there is no tomorrow.

Checking for document duplicates and similar documents in a document management application

Update: I have now written a PHP extension called php_ssdeep for the ssdeep C API to facilitate fuzzy hashing and hash comparisons in PHP natively. More information can be found over at my blog. I hope this is helpful to people.
I am involved in writing a custom document management application in PHP on a Linux box that will store various file formats (potentially 1000's of files) and we need to be able to check whether a text document has been uploaded before to prevent duplication in the database.
Essentially when a user uploads a new file we would like to be able to present them with a list of files that are either duplicates or contain similar content. This would then allow them to choose one of the pre-existing documents or continue uploading their own.
Similar documents would be determined by looking through their content for similar sentances and perhaps a dynamically generated list of keywords. We can then display a percentage match to the user to help them find the duplicates.
Can you recommend any packages for this process and any ideas of how you might have done this in the past?
The direct duplicate I think can be done by getting all the text content and
Stripping whitespace
Removing punctuation
Convert to lower or upper case
then form an MD5 hash to compare with any new documents. Stripping those items out should help prevent dupes not being found if the user edits a document to add in extra paragraph breaks for example. Any thoughts?
This process could also potentially run as a nightly job and we could notify the user of any duplicates when they next login if the computational requirement is too great to run in realtime. Realtime would be preferred however.
Update: I have now written a PHP extension called php_ssdeep for the ssdeep C API to facilitate fuzzy hashing and hash comparisons in PHP natively. More information can be found over at my blog. I hope this is helpful to people.
I have found a program that does what its creator, Jesse Kornblum, calls "Fuzzy Hashing". Very basically it makes hashes of a file that can be used to detect similar files or identical matches.
The theory behind it is documented here: Identifying almost identical files using context triggered piecewise hashing
ssdeep is the name of the program and it can be run on Windows or Linux. It was intended for use in forensic computing, but it seems suited enough to our purposes. I have done a short test on an old Pentium 4 machine and it takes about 3 secs to go through a hash file of 23MB (hashes for just under 135,000 files) looking for matches against two files. That time includes creating hashes for the two files I was searching against as well.
I'm working on a similar problem in web2project and after asking around and digging, I came to the conclusion of "the user doesn't care". Having duplicate documents doesn't matter to the user as long as they can find their own document by its own name.
That being said, here's the approach I'm taking:
Allow a user to upload a document associating it with whichever Projects/Tasks they want;
The file should be renamed to prevent someone getting at it via http.. or better stored outside the web root. The user will still see their filename in the system and if they download it, you can set the headers with the "proper" filename;
At some point in the future, process the document to see if there are duplicates.. at this point though, we are not modifying the document. After all, there could be important reasons the whitespace or capitalization is changed;
If there are dupes, delete the new file and then link to the old one;
If there aren't dupes, do nothing;
Index the file for search terms - depending on the file format, there are lots of options, even for Word docs;
Throughout all of this, we don't tell the user it was a duplicate... they don't care. It's us (developers, db admins, etc) that care.
And yes, this works even if they upload a new version of the file later. First, you delete the reference to the file, then - just like in garbage collection - you only delete the old file if there are zero references to it.

Categories