Basically I've got an old static html site ( http://www.brownwatson.co.uk/brochure/page1.html ) I need to add a search box to it to search a folder called /brochure within that folder is html documents and images etc I need the search to find ISBN numbers, Book Reference Numbers, Titles etc.. There's no database the hosting provider has got php I was trying to create something like this:
<div id="contentsearch">
<form id="searchForm" name="searchForm" method="post" action="search.php">
<input name="search" type="text" value="search" maxlength="200" />
<input name="submit" type="submit" value="Search" />
</form>
<?php
$dir = "/brochure/";
// Open a known directory, and proceed to read its contents
if (is_dir($dir)) {
if ($dh = opendir($dir)) {
while (($file = readdir($dh)) !== false) {
if($file == $_POST['search']){
echo(''. $file .''."\n");
}
}
closedir($dh);
}
}
?>
</div>
I know, I know this is pretty bad and doesn't work any ideas? I haven't created anything like this in years, and have pretty much just taken bits of code and stuck it together!
There are quite a few solutions available for this. In no particular order:
Free or Open Source
Google Custom Search Engine
Tapir - hosted service that indexes pages on your RSS feed.
Tipue - self hosted javaScript plugin, well documented, includes options for pinned search results.
lunr.js - javaScript library.
phinde - self hosted php and elasticsearch based search engine
See also http://indieweb.org/search#Software
Subscription (aka paid) Services:
Google Site Search
Swiftype - offers a free plan for personal sites/blogs.
Algolia
Amazon Cloud Search
A very, very lazy option (to avoid setting up a Google Custom Search Engine) is to make a form that points at Google with a hidden query element that limits the search to your own site:
<div id="contentsearch">
<form id="searchForm" name="searchForm" action="http://google.com/search">
<input name="q" type="text" value="search" maxlength="200" />
<input name="q" type="hidden" value="site:mysite.com"/>
<input name="submit" type="submit" value="Search" />
</form>
</div>
Aside from the laziness, this method gives you a bit more control over the appearance of the search form, compared to a CSE.
I was searching for solution for searching for my blog created using Jekyll but didn't found good one, also Custom Google Search was giving me ads and results from subdomains, so it was not good. So I've created my own solution for this. I've written an article about how to create search for static site like Jekyll it's in Polish and translated using google translate.
Probably will create better manual translation or rewrite on my English blog soon.
The solution is python script that create SQLite database from HTML files and small PHP script that show search results. But it will require that your static site hosting also support PHP.
Just in case the article go down, here is the code, it's created just for my blog (my html and file structure) so it need to be tweaked to work with your blog.
Python script:
import os, sys, re, sqlite3
from bs4 import BeautifulSoup
def get_data(html):
"""return dictionary with title url and content of the blog post"""
tree = BeautifulSoup(html, 'html5lib')
body = tree.body
if body is None:
return None
for tag in body.select('script'):
tag.decompose()
for tag in body.select('style'):
tag.decompose()
for tag in body.select('figure'): # ignore code snippets
tag.decompose()
text = tree.findAll("div", {"class": "body"})
if len(text) > 0:
text = text[0].get_text(separator='\n')
else:
text = None
title = tree.findAll("h2", {"itemprop" : "title"}) # my h2 havee this attr
url = tree.findAll("link", {"rel": "canonical"}) # get url
if len(title) > 0:
title = title[0].get_text()
else:
title = None
if len(url) > 0:
url = url[0]['href']
else:
url = None
result = {
"title": title,
"url": url,
"text": text
}
return result
if __name__ == '__main__':
if len(sys.argv) == 2:
db_file = 'index.db'
# usunięcie starego pliku
if os.path.exists(db_file):
os.remove(db_file)
conn = sqlite3.connect(db_file)
c = conn.cursor()
c.execute('CREATE TABLE page(title text, url text, content text)')
for root, dirs, files in os.walk(sys.argv[1]):
for name in files:
# my files are in 20.* directories (eg. 2018) [/\\] is for windows and unix
if name.endswith(".html") and re.search(r"[/\\]20[0-9]{2}", root):
fname = os.path.join(root, name)
f = open(fname, "r")
data = get_data(f.read())
f.close()
if data is not None:
data = (data['title'], data['url'], data['text']
c.execute('INSERT INTO page VALUES(?, ?, ?)', data))
print "indexed %s" % data['url']
sys.stdout.flush()
conn.commit()
conn.close()
and PHP search script:
function mark($query, $str) {
return preg_replace("%(" . $query . ")%i", '<mark>$1</mark>', $str);
}
if (isset($_GET['q'])) {
$db = new PDO('sqlite:index.db');
$stmt = $db->prepare('SELECT * FROM page WHERE content LIKE :var OR title LIKE :var');
$wildcarded = '%'. $_GET['q'] .'%';
$stmt->bindParam(':var', $wildcarded);
$stmt->execute();
$data = $stmt->fetchAll(PDO::FETCH_ASSOC);
$query = str_replace("%", "\\%", preg_quote($_GET['q']));
$re = "%(?>\S+\s*){0,10}(" . $query . ")\s*(?>\S+\s*){0,10}%i";
if (count($data) == 0) {
echo "<p>Brak wyników</p>";
} else {
foreach ($data as $row) {
if (preg_match($re, $row['content'], $match)) {
echo '<h3>' . mark($query, $row['title']) . '</h2>';
$text = trim($match[0], " \t\n\r\0\x0B,.{}()-");
echo '<p>' . mark($query, $text) . '</p>';
}
}
}
}
In my code an in article I've wrapped this PHP script in the same layout as other pages by adding front matter to PHP file.
If you can't use PHP on your hosting you can try to use sql.js which is SQLite compiled to JS with Emscripten.
If your site is well index by Google a quick and ready solution is use Google CSE.
Other than that for a static website with hard coded html pages and directory containing images; yes it is possible to create search mechanism. But trust me it is more hectic and resource consuming then creating a dynamic website.
Using PHP to search in directories and within files will be very inefficient. Instead of providing complicated PHP workarounds I would suggest go for a dynamic CMS driven website.
Related
I am working on a project involving tens of thousands of files that I downloaded from the internet. The source of the pages (MO government) didn't program the pages too well.
I am pulling certain elements from the pages to be put into another page to be referenced in my website better. Here is a working example:
<div id="intsect">
<strong>Common law in force--effect on statutes.</strong>
</div>
// PHP CODE
// Load Document
$doc = new DOMDocument();
// Load File Values
#$doc->loadHTMLFile("stathtml/" . $file);
// Load the <div id="intsect"></div> value
$genAssem = $doc->getElementById("intsect");
// Appropriate value
$genAssem = " <b>Statute Name: </b>" . $genAssem->textContent . "<br><br>";
# VALUE (example)
Statute Name: Common law in force--effect on statutes.
Here is the part that is killing me:
<div id="intsect">
<strong>Common law in force--effect on statutes.</strong>
</div>
<!-- THIS PART -->
<p> 1.035. Whenever the word "voter" is used in the laws of this state it shall mean registered voter, or legal voter.
The programmers didn't give it an ID or a Class. I need to extract the paragraph tag that follows #intsect. Is there a PHP selector that can select the <p></p> tags after the #intsect one?
You can use xpath to target that <p> tag which has a preceding sibling of div that has an ID of intsect:
$doc = new DOMDocument;
#$doc->loadHTMLFile("stathtml/" . $file);
$xpath = new DOMXpath($doc);
$p = $xpath->query('//p[preceding-sibling::div[#id="intsect"]]');
if($p->length > 0) {
echo $p->item(0)->nodeValue;
}
Sample Output
I'd like to be able to grab data such as list of articles from yahoo finance. At the moment I have a local hosted webpage that searched yahoo finance for stock symbols (E.g Nok), It then returns the opening price, current price, and how far up or down the price has gone.
What I'd like to do is actually grab related links that yahoo has on the page - These links have articles related to the share price...E.g https://au.finance.yahoo.com/q?s=nok&ql=1 Scroll down to headlines, I'd like to grab those links.
At the moment I'm working off a book (PHP Advanced for the world wide web, I know it's old but I found it laying around yesterday and it's quite interesting :) ) In the book it says 'It's important when accessing web pages to know exactly where the data is' - I would think by now there would be a way around this...Maybe the ability to search for links that have a particular keyword in it or something like that!
I'm wondering if theres a special trick I can use to grab particular bits of data on a webpage?? Like crawlers, they are able to grab links that are related to something.
It would be great to know how to do this, then i'd be able to apply it to other subjects in the future.
Ill add my code that I have at the moment. This is purely for practise as I'm learning PHP in my course :)
##getquote.php
<!DOCTYPE html PUBLIC "-//W3// DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/2000/REC-xhtml1-20000126/DTD/xhtml1-transitional.dtd">
<html xmlns="https://www.w3.org/1999/xhtml">
<head>
<title>Get Stock Quotes</title>
<link href='css/style.css' type="text/css" rel="stylesheet">
</head>
<h1>Stock Reader</h1>
<body>
<?php
//Read[1] = current price
//read[5] = opening price
//read[4] = down or up whatever percent from opening according to current price
//Step one
//Begin the PHP section my checking if the form has been submitted
if(isset($_POST['submit'])){
//Step two
//Check if a stock symbol was entered.
if(isset($_POST['symbol'])){
//Define the url to be opened
$url = 'http://quote.yahoo.com/d/quotes.csv?s=' . $_POST['symbol'] . '&f=sl1d1t1c1ohgv&e=.csv';
//Open the url, if can't SHUTDOWN script and write msg
$fp = fopen($url, 'r') or die('Cannot Access YAHOO!.');
//This will get the first 30 characters from the file located in $fp
$read = fgetcsv ($fp, 30);
//Close the file processsing.
fclose($fp);
include("php/displayDetails.php");
}
else{
echo "<div style='color:red'>Please enter a SYMBOL before submitting the form</div>";
}
}
?>
<form action='getquote.php' method='post'>
<p>Symbol: </p><input type='text' name='symbol'>
<br />
<input type="submit" value='Fetch Quote' name="submit">
</form>
<br />
<br />
##displayDetails.php
<div class='display-contents'>
<?php
echo "<div>Todays date: " . $read[2] . "</div>";
//Current price
echo "<div>The current value for " . $_POST["symbol"] . " is <strong>$ " . $read[1] . "</strong></div>";
//Opening Price
echo "<div>The opening value for " . $_POST["symbol"] . " is <strong>$ " . $read[5] . "</strong></div>";
if($read[1] < $read[5])
{
//Down or Up depending on opening.
echo "<div>" .strtoupper($_POST['symbol']) ."<span style='color:red'> <em>IS DOWN</em> </span><strong>$" . $read[4] . "</strong></div>";
}
else{
echo "<div>" . strtoupper($_POST['symbol']) ."<span style='color:green'> <em>IS UP</em> </span><strong>$" . $read[4] . "</strong></div>";
}
added code to displayDetails.php
function getLinks(){
$siteContent = file_get_contents($url);
$div = explode('class="yfi_headlines">',$siteContent);
// every thing inside is a content you want
$innerContent = explode('<div class="ft">',$div)[0]; //now you have inner content of your div;
$list = explode("<ul>",$innerConent)[1];
$list = explode("</ul>",$list)[0];
echo $list;
}
?>
</div>
I just the same code in - I didn't really know what I should do with it?!
Idk for fgetcsv but with file_get_contents you can grab whole content of a page into a string variable.
Then you can search for links in string (do not use regex for html content search: Link regex)
I briefly looked at yahoo's source code so you can do:
-yfi_headlines is a div class witch wrappes desired links
$siteContent = file_get_contents($url);
$div = explode('class="yfi_headlines">',$siteContent)[1]; // every thing inside is a content you want
-last class inside searched div is: ft
$innerContent = explode('<div class="ft">',$div)[0]; //now you have inner content of your div;
repeat for getting <ul> inner content
$list = explode("<ul>",$innerConent)[1];
$list = explode("</ul>",$list)[0];
now you have a list of links in format: <li>text</li>
There are more efficient ways to parse web page like using DOMDocument:
Example
For getting content of a page you can also look at this answer
https://stackoverflow.com/a/15706743/2656311
[ADITIONALY] IF it is a large website: at the beggining of a function do: ini_set("memory_limit","1024M"); so you can store more data to your memory!
So I have three pages one that is the index page. One that writes the data from a form inside the index page to the database. And one that gets data from the database and echos out a html table with the data inside.
Currently if you write a link in the form. It will just come out as text. I would like the whole link to be like [link].
so say if I wrote this onto the form:
Look at this: www.google.com or Look at this: https://www.google.com
it would come out like this in html
Look at this: www.google.com
How could I go about doing this?
Okay so the html is:
<form class="wide" action="Write-to.php" method="post">
<input class="wide" autocomplete="off" name="txt" type="text" id="usermsg" style="font-size:2.4vw;" value="" />
</form>
in which the user would write:
"Look at this: www.google.com or Look at this: https://www.google.com"
This would then get sent to the database through Write-to.php.
$sql="INSERT INTO social (comunicate)
VALUES
('$_POST[txt]')";
}
this then gets written back into the database:
$result = mysqli_query($con,"(select * from social order by id desc limit {$limit_amt}) order by id asc");
while($row = mysqli_fetch_array($result))
{
echo "<tr div id='".$i."' class='border_bottom'>";
echo "<th scope='col'>";
echo "<span class='text'>".htmlspecialchars($row['comunicate'])."</span><br />";
echo "</th>";
echo "</tr>";
}
Just try:
echo(''.$your_url_variable.'');
Update:
The OP really wanted to detect url's in a string. One possible solution could be filter it using a regular expression. This code could help:
<?php
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// The Text you want to filter for urls
$text = "The text you want to filter goes here. http://google.com";
// Check if there is a url in the text
if(preg_match($reg_exUrl, $text, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "{$url[0]} ", $text);
} else {
// if no urls in the text just return the text
echo $text;
}
?>
Source: http://css-tricks.com/snippets/php/find-urls-in-text-make-links/
There are quite a few things you need to worry about when displaying user supplied (tainted) data.
You must ensure that all the data is sanitised -- never ever just echo the content, look into htmspecialchars and FILTER_VALIDATE_URL for example:
function validateUrl($url) {
return filter_var($url, FILTER_VALIDATE_URL);
}
What you are attempting to do is convert a string into a link, for example you can write a function like this:
function makeClickable($link) {
$link = htmlspecialchars($link);
return sprintf('%s', $link, $link);
}
You can use string concatenation as well, but I wouldn't do that in my view code. Just personal preference.
Take a look at the urlencode function, it will certainly come in handy.
I would also recommend you read about cross site scripting
Please note that I am not making any implementation recommendations, my aim is just to show you some contrived code samples that demonstrate making a string "clickable".
Update:
If you would like to make links clickable within text, refer to the following questions:
Best way to make links clickable in block of text
Replace URLs in text with HTML links
save the hyperlink in db and retrieve as a string by sql query
like:
select link from table_name where index = i
and save link as: whaatever here
and print it
Use this
echo '' . $res['url'] . '';
Hello I'm using Curl to get information from Wikipedia,and I want to receive only information about the principal image,I don't want to receive all images of an article..
For example..
If I want to get info about all images of the English Language (http://en.wikipedia.org/wiki/English_language) I should go to this URL:
http://en.wikipedia.org/w/api.php?action=query&titles=English_Language&prop=images
but I receive flags of countries where people speak English in XML:
<?xml version="1.0"?> <api> <query>
<normalized>
<n from="English_language" to="English language" />
</normalized>
<pages>
<page pageid="8569916" ns="0" title="English language">
<images>
<im ns="6" title="File:Anglospeak(800px)Countries.png" />
<im ns="6" title="File:Anglospeak.svg" />
<im ns="6" title="File:Circle frame.svg" />
<im ns="6" title="File:Commons-logo.svg" />
<im ns="6" title="File:Flag of Argentina.svg" />
<im ns="6" title="File:Flag of Aruba.svg" />
<im ns="6" title="File:Flag of Australia.svg" />
<im ns="6" title="File:Flag of Bolivia.svg" />
<im ns="6" title="File:Flag of Brazil.svg" />
<im ns="6" title="File:Flag of Canada.svg" />
I only want the information about the principal image.
There's news! (from 2014)
A new extension, PageImages, is available and also got already installed on the Wikimedia wikis.
Instead of prop=images, use prop=pageimages, and you'll get a pageimage attribute and a <thumbnail> child node for each <page> element.
Admittedly, it's not guaranteed to give the best results, but in your example (English Language) it works well and only yields the map of the geographic distribution, not all the flags.
Also, the OpenSearch API does return an <image> in it's xml representation, but this API is not usable with lists and cannot be combine with the Query API.
This is how I got it working...
$.getJSON("http://en.wikipedia.org/w/api.php?action=query&format=json&callback=?", {
titles: "India",
prop: "pageimages",
pithumbsize: 150
},
function(data) {
var source = "";
var imageUrl = GetAttributeValue(data.query.pages);
if (imageUrl == "") {
$("#wiki").append("<div>No image found</div>");
} else {
var img = "<img src=\"" + imageUrl + "\">"
$("#wiki").append(img);
}
}
);
function GetAttributeValue(data) {
var urli = "";
for (var key in data) {
if (data[key].thumbnail != undefined) {
if (data[key].thumbnail.source != undefined) {
urli = data[key].thumbnail.source;
break;
}
}
}
return urli;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<html>
<head></head>
<body>
<div id="wiki"></div>
</body>
</html>
As others have noted, Wikipedia articles don't really have any such thing as a "principal image", so your first problem will be deciding how to choose between the different images used on a given page. Some possible selection criteria might be:
Biggest image in the article.
First image exceeding some specific minimum dimensions, e.g. 60 × 60 pixels.
First image referenced directly in the article's source text, rather than through a template.
For the first two options, you'll want to fetch the rendered HTML code of the page via action=parse and use an HTML parser to find the img tags in the code, like this:
http://en.wikipedia.org/w/api.php?action=parse&page=English_language&prop=text|images
(The reason you can't just get the sizes of the images, as used on the page, directly from the API is that that information isn't actually stored anywhere in the MediaWiki database.)
For the last option, what you want is the source wikitext of the article, available via prop=revisions with rvprop=content:
http://en.wikipedia.org/w/api.php?action=query&titles=English_language&prop=revisions|images&rvprop=content
Note that many images in infoboxes and such are specified as parameters to a template, so just parsing for [[Image:...]] syntax will miss some of them. A better solution is probably to just get the list of all images used on the page via prop=images (which you can do in the same query, as I showed above) and look for their names (with or without Image: / File: prefix) in the wikitext.
Keep in mind the various ways in which MediaWiki automatically normalizes page (and image) names: most notably, underscores are mapped to spaces, consecutive whitespace is collapsed to a single space and the first letter of the name is capitalized. If you decide to go this way, here's some sample PHP code that will convert a list of file names into a regexp that should match any of them in wikitext:
foreach ($names as &$name) {
$name = trim( preg_replace( '/[_\s]+/u', ' ', $name ) );
$name = preg_quote( $name, '/' );
$name = preg_replace( '/^(\\\\?.)/us', '(?i:$1)', $name );
$name = preg_replace( '/\\\\? /u', '[_\s]+', $name );
}
$regexp = '/' . implode( '|', $names ) . '/u';
For example, when given the list:
Anglospeak(800px)Countries.png
Anglospeak.svg
Circle frame.svg
Commons-logo.svg
Flag of Argentina.svg
Flag of Aruba.svg
the generated regexp will be:
/(?i:A)nglospeak\(800px\)Countries\.png|(?i:A)nglospeak\.svg|(?i:C)ircle[_\s]+frame\.svg|(?i:C)ommons\-logo\.svg|(?i:F)lag[_\s]+of[_\s]+Argentina\.svg|(?i:F)lag[_\s]+of[_\s]+Aruba\.svg/u
Important addendum
Bergi's answer, above, seemed super great, but I was bashing my head out because I couldn't get it to work.
I needed to include pilicense=any in my query, because otherwise any copyrighted imagery was ignored.
Here's the query I ultimately got working:
https://en.wikipedia.org/w/api.php?action=query&pilicense=any&format=jsonfm&prop=pageimages&generator=search&gsrsearch=My+incategory:English-language_films+prefix:My&gsrlimit=3
I know it's been awhile, but this is one of the first pages I landed on when I started my days-long search for how to do this, so I wanted to share this specifically on this page, for others like me who might come here.
You can limit your query to the first image in the article with the imlimit parameter:
http://en.wikipedia.org/w/api.php?action=query&titles=English_Language&redirects&prop=images&imlimit=1
I have the following PHP code which is a file name I'm using as the title of my pages for each download I provide to viewers. I want to be able to have SEO-friendly titles and URLs.
<?php echo $file_info[21]; ?>
The URLs of each page look similar to this:
http://site.com/directory/download_interim_finder.php?it=ZWtkaW1rcmInbn1ma3h9aXFLcHN6fWVzeDgnISJoO3tran0reW97cSd1YHR5YHZIbWBmfzxwY3tiOnxycnA8Y25zfWpna3Y+YXtlZnl7dnJLf3J8cWd2fT4kJCIramFbcm1tbWRqdHJmJnR2ZGhKd3JpdShQdGV+bX09fWh0b2JnfXx9anV2J3F7YHh4d3Z2S3Jyf3ZmdX8pNyg1anZ/bnhpdGJwVnx5cXF5Zn5bYW14cHp2bTQiKXFsYw==
I would like to convert this long URL to something a little more SEO-friendly and that will incorporate the php code I have specified above to something like "filename.php" instead of the long URL you see here.
I was hoping to use .htaccess but I've read that you can't do it through there. I've spent hours looking for ways to fix this and can't find a solution. Any help would be appreciated.
Thanks!
Step By Step instruction about create SEO friendly URL with dynamic content using PHP and .htaccess mod redirection. Friendly URLs improves your site search engines ranking. Before trying this you have to enable mod_rewrite.so module at httpd.conf. It’s simple just few lines of PHP code converting title data to clean URL format.
Database
Sample database blog table columns id, title, body and url.
CREATE TABLE `blog`
(
`id` INT PRIMARY KEY AUTO_INCREMENT,
`title` TEXT UNIQUE,
`body` TEXT,
`url` TEXT UNIQUE,
);
Publish.php
Contains PHP code. Converting title text to friendly url formate and storing into blog table.
<?php
include('db.php');
function string_limit_words($string, $word_limit)
{
$words = explode(' ', $string);
return implode(' ', array_slice($words, 0, $word_limit));
}
if($_SERVER["REQUEST_METHOD"] == "POST")
{
$title = mysql_real_escape_string($_POST['title']);
$body = mysql_real_escape_string($_POST['body']);
$title = htmlentities($title);
$body = htmlentities($body);
$date = date("Y/m/d");
//Title to friendly URL conversion
$newtitle = string_limit_words($title, 6); // First 6 words
$urltitle = preg_replace('/[^a-z0-9]/i', ' ', $newtitle);
$newurltitle = str_replace(" ", "-", $newtitle);
$url = $date . '/' . $newurltitle . '.html'; // Final URL
//Inserting values into my_blog table
mysql_query("insert into blog(title,body,url) values('$title','$body','$url')");
}
?>
<!--HTML Part-->
<form method="post" action="">
Title:
<input type="text" name="title"/>
Body:
<textarea name="body"></textarea>
<input type="submit" value=" Publish "/>
</form>
Article.php
Contains HTML and PHP code. Displaying content from blog table.
<?php
include('db.php');
if($_GET['url'])
{
$url =mysql_real_escape_string($_GET['url']);
$url = $url . '.html'; //Friendly URL
$sql = mysql_query("select title,body from blog where url='$url'");
$count = mysql_num_rows($sql);
$row = mysql_fetch_array($sql);
$title = $row['title'];
$body = $row['body'];
}
else
{
echo '404 Page.';
}
?>
<!-- HTML Part -->
<body>
<?php
if($count)
{
echo "<h1>$title</h1><div class='body'>$body</div>";
}
else
{
echo "<h1>404 Page.</h1>";
}
?>
</body>
.htaccess
URL rewriting file. Redirecting original URL 9lessons.info/article.php?url=test.html to 9lessons.info/test.html
RewriteEngine On
RewriteRule ^([a-zA-Z0-9-/]+).html$ article.php?url=$1
RewriteRule ^([a-zA-Z0-9-/]+).html/$ article.php?url=$1
When I look at the URL example you posted I see two things:
It points to a PHP script:
/directory/download_interim_finder.php
It has a (looks like a) Base64 encoded argument which obviously no human can read
I'm guessing your input for the download script is encoded in it-argument string, you will need to put this in the URL else you cant identify which file you want to download. Since you want your URLs to be semantic you have to add your semantic information to the URL.
You could for instance make URLs like:
/download/documentation/productX/[[Base64encodedinputstring]]
In your .htaccess you then need the rule:
RewriteEngine On
RewriteRule /download/.*/([^\/]+) /directory/download_interim_finder.php?it=$2
If your host does not allow you to use .htaccess then you can not easily imploy semantic urls.