In trying to rejuvinate code I wrote mostly 14+ years ago. I've come to see that the lovely little setup I wrote then was... lacking in certain places, namely handling user inputs.
Lesson: Never underestimate users ability to inject trash, typos, and dupes past your validators.
The old way is reaching critical mass as there are 470 items in a SELECT dropdown now. I want to reinvent this part of the process so I don't have to worry about it hitting a breaking point.
So the idea is to build a fuzzy search method so that after the typist enters the search string, we check against five pieces of data, all of which reside in the same row.
I need to check the name submitted against the Stage Name, two also-known-as names, as well as their legal name and as a final check against a soundex() index based on their Stage Name (this catches a few spelling errors missed otherwise)
I've tried a complicated block of code to check these things (and it doesn't work, mostly because I think I coded the comparisons too strict) as part of a do/while loop.
In the below, var $Rin would contain the user supplied name.
$setr = mysql_query("SELECT ID,StageName,AKA1,AKA2,LegalName,SoundEx FROM performers");
IF ($R = mysql_fetch_array($setr)) {
do {
$RT = substr(trim($Rin), 5);
$RT1 = substr($R[1], 5);
$RT2 = substr($R[2], 5);
$RT3 = substr($R[3], 5);
$RT4 = substr($R[4], 5);
$RTx = soundex($RT);
IF ($RT == $RT1) {
$RHits[] = $R[0];
}
IF ($RT == $RT2) {
$RHits[] = $R[0];
}
IF ($RT == $RT3) {
$RHits[] = $R[0];
}
IF ($RT == $RT4) {
$RHits[] = $R[0];
}
IF ($RTx == $R[5]) {
$RHits[] = $R[0];
}
} while ($R = mysql_fetch_array($setr));
}
The idea being that I'll build an array of the ID#'s of the near hits, which I'll populate into a select dropdown that has only hopefully fewer hits that the whole table. Which means querying for a result set from the contents of that array, in order to display the Performer's name in the SELECT dropdown and pass the ID# as the value for those choices.
Thats when I hit the 'I need to use an array in my WHERE clause' problem, and after finding that answer, I am starting to suspect I'm out of luck due to Stipulation #2 below. So I started looking at alternate search methods and I'm not sure I've gotten anywhere but more confused.
So, is there a better way to scan a single table for six fields, checking five against user input and noting the sixth for display in a subset of the original table?
Thought process:
Against the whole table, per record, test $Rin against these tests in this order:
$Rin -> StageName
$Rin -> AKA1
$Rin -> AKA2
$Rin -> LegalName
soundex($Rin) -> SoundEx
where a hit on any of the five operations adds the ID# to a result array that is used to narrow the results from 470 performers down to a reasonable list to choose from.
Stipulations:
1) As written, I know this is vulnerable to an SQL injection attack.
2) Server runs PHP 4.4.9 and MySQL 4.0.27-Standard, I can't upgrade it. I've got to prove it works before money will be spent.
3) This is hobby-level stuff, not my day job.
4) Performers often use non-English names or elements in their names, and this has led to typos and duplication by the data entry typists.
I've found a lot of mysqli and PDO answers for this sort of thing, and I'm seeing a lot of things that only half make sense (like link #4 below). I'm working on getting up to speed on these things as I try and fix whats become broken.
Places already looked:
PHP mysql using an array in WHERE clause
PHP/MySQL small-scale fuzzy search
MySQL SubString Fuzzy Search
Sophisticated Name Lookup
I mentioned in the comments that a Javascript typeahead library might be a good choice for you. I've found Twitter's Typeahead library and Bloodhound engine to be pretty robust. Unfortunately, the documentation is a mixed bag: so long as what you need is very similar to their examples, you're golden, but certain details (explanations of the tokenizers, for example) are missing.
In one of the several questions re Typeahead here on Stack Overflow, #JensAKoch says:
To be honest, I think twitter gave up on typeahead.js. We look at 13000 stars, a full bugtracker with no maintainer and a broken software, last release 2015. I think that speaks for itself, or not? ... So, try one of the forks: github.com/corejavascript/typeahead.js
Frankly, in a brief check, the documentation at the fork looks a bit better, if nothing else. You may wish to check it out.
Server-side code:
All of the caveats of using an old version of PHP apply. I highly recommend retooling to use PDO with PHP 5, but this example uses PHP 4 as requested.
Completely untested PHP code. json_encode() would be better, but it doesn't appear until PHP 5. Your endpoint would be something like:
headers("Content-Type: application/json");
$results = mysql_query(
"SELECT ID,StageName,AKA1,AKA2,LegalName,SoundEx FROM performers"
);
$fields = array("ID","StageName","AKA1","AKA2","LegalName","SoundEx");
echo "[";
$first = true;
while ($row = mysql_fetch_array($results)) {
($first) ? $first = false : echo ',';
echo "\n\t,{";
foreach($fields as $f) {
echo "\n\t\t\"{$f}\": \"".$row[$f]."\"";
}
echo "\n\t}";
}
echo "]";
Client-side code:
This example uses a static JSON file as a stub for all of the results. If you anticipate your result set going over 1,000 entries, you should look into the remote option of Bloodhound. This would require you to write some custom PHP code to handle the query, but it would look largely similar to the end point that dumps all (or at least your most common) data.
var actors = new Bloodhound({
// Each row is an object, not a single string, so we have to modify the
// default datum tokenizer. Pass in the list of object fields to be
// searchable.
datumTokenizer: Bloodhound.tokenizers.obj.nonword(
'StageName','AKA1','AKA2','LegalName','SoundEx'
),
queryTokenizer: Bloodhound.tokenizers.whitespace,
// URL points to a json file that contains an array of actor JSON objects
// Visit the link to see details
prefetch: 'https://gist.githubusercontent.com/tag/81e4450de8eca805f436b72e6d7d1274/raw/792b3376f63f89d86e10e78d387109f0ad7903fd/dummy_actors.json'
});
// passing in `null` for the `options` arguments will result in the default
// options being used
$('#prefetch .typeahead').typeahead(
{
highlight: true
},
{
name: 'actors',
source: actors,
templates: {
empty: "<div class=\"empty-message\">No matches found.</div>",
// This is simply a function that accepts an object.
// You may wish to consider Handlebars instead.
suggestion: function(obj) {
return '<div class="actorItem">'
+ '<span class="itemStageName">'+obj.StageName+"</span>"
+ ', <em>legally</em> <span class="itemLegalName">'+obj.LegalName+"</span>"
}
//suggestion: Handlebars.compile('<div><strong>{{value}}</strong> – {{year}}</div>')
},
display: "LegalName" // name of object key to display when selected
// Instead of display, you can use the 'displayKey' option too:
// displayKey: function(actor) {
// return actor.LegalName;
// }
});
/* These class names can me specified in the Typeahead options hash. I use the defaults here. */
.tt-suggestion {
border: 1px dotted gray;
padding: 4px;
min-width: 100px;
}
.tt-cursor {
background-color: rgb(255,253,189);
}
/* These classes are used in the suggestion template */
.itemStageName {
font-size: 110%;
}
.itemLegalName {
font-size: 110%;
color: rgb(51,42,206);
}
<script src="https://code.jquery.com/jquery-3.1.1.min.js"></script>
<script src="https://twitter.github.io/typeahead.js/releases/latest/typeahead.bundle.js"></script>
<p>Type something here. A good search term might be 'C'.</p>
<div id="prefetch">
<input class="typeahead" type="text" placeholder="Name">
</div>
For ease, here is the Gist of the client-side code.
Related
I have an app that communicates with my API that runs php and mysql.
What I wanted to do was record changes that occur to entities in my table for each user. If a user makes a change to their data, I can see the change that occurred. This way if they ever have questions or accidentally delete something, I can go back and tell them what the entities looked like at various stages in the year.
I don't need to be crazy specific about the differences, all I would like to do is record inserts or updates (as it's represented in a JSON body).
Basically what I did for now was any time a POST/PUT occurs to my API for certain routes, I just take the JSON in the request body, and I save it to a record in the database as a transaction that took place for that user.
This was great early on, but after hundreds of thousands of records, the JSON body is large and is taking up a lot of room. My database table is 13GB. Queries take a while to run, too. I truncated it, but within 4 months it grew again to another 10GB. This problem will likely only get larger.
Is there an approach someone can recommend to record this? Can I maybe send the request body over to something on AWS or some other storage offline or another database somewhere else? Flat files perhaps or a non-relational database? It's not like I actually need the data in real time but if I ever wanted to get a history of someone I'd like to know I could.
I do take nightly backups of the DB, so an alternate approach was I was thinking of cutting out the transaction logs entirely, and instead just letting it continue to back up nightly. Sure, I won't be able to show a history of what dates entities were updated/added, but at least I could always reference a few backups to see what records were for a given user on a certain date after I do a restore.
Any ideas or suggestions? Thanks!
Instead of logging the entire JSON, you can just log the values that have changed and you also don't have to log your insert data as your database will always have the current record and logging the insert data is redundant.
You can implement a Diff function to compare difference in your existing JSON to the changed JSON.
To illustrate an example see the code below that borrows a JavaScript Diff function from this Answer.
// get the current value from your database
var oldvalues = {
"id": 50,
"name": "Old Name",
"description": "Description",
"tasks": [{
'foo': 'bar'
}]
};
var newvalues = {
"id": 50,
"name": "New name",
"description": "Description",
"tasks": [{
'foo': 'bar'
}]
};
var isEmptyObject = function(obj) {
var name;
for (name in obj) {
return false;
}
return true;
};
var diff = function(obj1, obj2) {
var result = {};
var change;
for (var key in obj1) {
if (typeof obj2[key] == 'object' && typeof obj1[key] == 'object') {
change = diff(obj1[key], obj2[key]);
if (isEmptyObject(change) === false) {
result[key] = change;
}
}
else if (obj2[key] != obj1[key]) {
result[key] = obj2[key];
}
}
return result;
};
var update = diff(oldvalues, newvalues);
//save this to your database
$('#diff').text(JSON.stringify(update));
textarea {
width: 400px;
height: 50px
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id="diff"></textarea>
As you can see only the only change that would be saved is {"name":"New name"} which will cut down on your data usage.
You would of course need to either port this PHP or look at some existing packages such as node-rus-diff
that might serve your needs.
As long as you are keeping a timestamp or a sequence number you can chain multiple transactions to rollback to any prior state. This is analogous to doing an incremental backup.
You could also run a maintenance task at set intervals if you would like to create checkpoints and compare a current state to a previous state. Perhaps once a month take a back up and record the differences between objects that have changed. This would be analogous to a differential backup.
Finally, you can take a full back up and clear out out the previous transactions, analogous to a full back up.
It is common practice for administrators to perform a combination of incremental, differential and full backups to balance storage costs and recovery needs. Using these approaches outline above you can implement the strategy that is right for you.
My site has a large number of products in the database. I want to add a product sheet for each product but the database has no set "slot" for it. So I was thinking of writing a php code into the template which checks the part number and use this to load the correct url for the product sheet as a link. For Example
<?php
if (strpos($product_sku,'KE15000/12') !== false) {
$factsheetimage_urlZ='/images/FactsheetBTN.png';
$factsheetweblink_url="images/factSheetKE15000/12.pdf";
} else if (strpos($product_sku,'KE2000/12') !== false) {
$factsheetimage_urlZ='/images/FactsheetBTN.png';
$factsheetweblink_url="images/factSheetKE20000/12.pdf";
} else {
$factsheetimage_urlZ='/images/blank.png';
}
?>
<div>
<a href="<?php echo $factsheetweblink_url;?>">
<img src="<?php echo $factsheetimage_urlZ;?>"></a>
</div>
At moment I'm using if else statements (I'm pretty new to PHP) and I was wondering if there's a way to check the $product_SKU against an XML document to auto load the correct link rather than doing around 300 if else statements. ($product_SKU is the unique product code loaded on each page)
There are a few ways to go about it.
If the $factsheetweblink_url can be generated base on the $product_sku in some programatic way, that would probably be my first preference. eg. $factsheetweblink_url = "images/factSheet{$product_sku}.pdf";
Secondly, adding this column to the database would be a the best option if possible.
Otherwise, the lookup table you mention is certainly possible, XML is one option. If you're writing it by hand, or it needs to be written by a non-technical person would be some considerations for choosing a format. If it's just you, I'd probably use a simpler format (even a plain PHP array).
A simple example of this type of mapping as a PHP array might look like:
sku_url_map.php:
<?php
return [
'KE15000/12' => 'images/factSheetKE15000/12.pdf',
'KE20000/12' => 'images/factSheetKE20000/12.pdf',
];
product_page.php:
<?php
$sku_url_map = require 'sku_url_map.php';
// ...
if (isset($sku_url_map[$product_sku])) {
$factsheetweblink_url = $sku_url_map[$product_sku];
}
of course, more complex structures can be used if it's more that a simple 1:1 mapping.
I have implemented a basic auto-complete feature using jQuery autocomplete. I am querying DB every time which is making auto-complete thing quite slow. I am looking for ways to make it faster much like Quora.
Here is the code from front-end:
<script type="text/javascript">
var URL2 = '<?php e(SITE_URL); ?>fronts/searchKeywords';
jQuery(document).ready(function(){
var CityKeyword = jQuery('#CityKeyword');
CityKeyword.autocomplete({
minLength : 1,
source : URL2
});
});
</script>
Here is the code from server side:
function searchKeywords(){
if ($this->RequestHandler->isAjax() ) {
$this->loadModel('Expertise_area');
Configure::write ( 'debug',0);
$this->autoRender=false;
$expertise=$this->Expertise_area->find('all',array(
'conditions'=>array('Expertise_area.autocomplete_text LIKE'=>'%'.$_GET['term'].'%'),
'fields' => array('DISTINCT (Expertise_area.autocomplete_text) AS autocomplete_text'),
'limit'=>5
));
$i=0;
if(!empty($expertise)){
$len = strlen($_GET['term']);
foreach($expertise as $valueproductname){
$pos = stripos($valueproductname['Expertise_area']['autocomplete_text'],$_GET['term']);
$keyvalue = "";
if($pos == 0) {
$keyvalue= "<strong>".substr($valueproductname['Expertise_area']['autocomplete_text'],$pos,$len)."</strong>"
.substr($valueproductname['Expertise_area']['autocomplete_text'],$len);
}else {
$keyvalue= substr($valueproductname['Expertise_area']['autocomplete_text'],0,$pos)."<strong>"
.substr($valueproductname['Expertise_area']['autocomplete_text'],$pos,$len)."</strong>"
.substr($valueproductname['Expertise_area']['autocomplete_text'],$pos+$len);
}
$response[$i]['value']=$valueproductname['Expertise_area']['autocomplete_text'];
$response[$i]['label']="<span class=\"username\">".$keyvalue."</span>";
$i++;
}
echo json_encode($response);
}else{
}
}
}
I have researched a bit and so far following solutions are worth looking at:
Query data on page load and store it in COOKIE to be used in future.
Implement some caching mechanism (memcache??). But my website is on Cakephp which does it internal cahcing if I am right. So will it be worth to go in this direction.
Use some third party indexing mechanism like Solr, Lucene etc. Don't know much about this.
Implement a much complex "Prefix Search" myself
What is the right way to go about it? Please help me out here.
I've never tried this but will be doing it soon for a project I'm working on.
I always considered the possibility of during the initial page load recieveing some AJAX (or perhaps just including it in the page) the top 10 words for each alphabet letter.. e.g.
A - apples, anoraks, alaska, angela, aha, air, arrgh, any, alpha, america
B - butter, bob etc.....
This way when user presses A-Z you can instantly provide them with 10 of the most popular keywords without any further requests, as you already have them stored in an array in the JS.
I'm not sure of size/memory usage but this could be extended further to handle the first 2 letters, e.g. AA, AB, AC.....BA, BB, BC.... ZA, ZB, ZZ... of course many combinations such as words starting with ZZ won't have any data unless it's a music site and it's ZZ Top! This means it probably won't take up so much memory or bandwidth to send this data during initial page load. Only when the user types the 3rd letter do you need to do any further data lookups/transfers.
You auto-update this data every day, week or whatever depending on site usage and the most popular searches.
I am adding a solution to my question which I figured out after a lot of research.
Problem was:
I was using Ajax to fetch keywords from database every time a user changes text in search box
I was doing a wild card search to match search item within entire strings and not just starting of keywords for ex. "dev" would return "social development", "development" etc
Solution:
I have a fixed array of keywords (200) which is not going to increase exponentially in near future. So, instead of doing complex indexing I am currently sending all keywords in an array.
I am sending this data in an array on page load since it is small. If it becomes large, I will fetch it in background via some ajax in different indexed arrays.
I am using jQuery's Autocomplete widget to do rest of thing for me.
For highlighting search item, I am using a hack by working around __renderItem. (Copied from Stackoverflow. Thanks to that!!)
Code:
function monkeyPatchAutocomplete() { //Hack to color search item
jQuery.ui.autocomplete.prototype._renderItem = function( ul, item) {
var re = new RegExp("(?![^&;]+;)(?!<[^<>]*)(" + this.term + ")(?![^<>]*>)(?![^&;]+;)", "gi");
var t = item.label.replace(re,"<span style='font-weight:bold;color:#434343;'>" +
"$&" +
"</span>");
return jQuery( "<li></li>" )
.data( "item.autocomplete", item )
.append( "<a>" + t + "</a>" )
.appendTo( ul );
};
}
function getKeywords(){
//Function that returns list of keywords. I am using an array since my data is small.
//This function can be modified to fetch data in whatever way one want.
//I intend to use indexed arrays in future if my data becomes large.
var allKeywords = <?php echo json_encode($allKeywords); ?>;
return allKeywords;
}
jQuery(document).ready(function(){
monkeyPatchAutocomplete();
var CityKeyword = jQuery('#CityKeyword');
CityKeyword.autocomplete({
minLength : 1,
source : getKeywords()
});
});
I have a simple web-based database using php/mysql that I use to keep track of products leaving my stockroom.
The MySQL database has a bunch of tables but the two I'm concerned with are 'Requests' and 'Salesperson' which you can see below (I've omitted irrelevant information).
Requests
R_ID ... R_Salesperson
1 ... James
2 ... Bob
3 ... Craig
Salesperson
S_ID S_Name
1 ... James
2 ... Bob
3 ... Craig
In my head section I have the following script that dynamically populates a list of our sales staff names as you type them:
// Autocomplete Salesperson Field
$("#form_specialist").autocomplete("../includes/get_salesperson_list.php", {
width: 260,
matchContains: true,
//mustMatch: true,
//minChars: 0,
//multiple: true,
//highlight: false,
//multipleSeparator: ",",
selectFirst: false
});
aaand get_salesperson_list.php:
<?php
require_once "get_config.php";
$q = strtolower($_GET["q"]);
if (!$q) return;
$sql = "select DISTINCT S_Name as S_Name from Salesperson where S_Name LIKE '%$q%'";
$rsd = mysql_query($sql);
while($rs = mysql_fetch_array($rsd)) {
$cname = $rs['S_Name'];
echo "$cname\n";
}
?>
I also have some basic javascript input validation requiring a value be entered in the Salesperson field (script is in the head section):
<!-- Input Validation -->
<script language="JavaScript" type="text/javascript">
<!--
function checkform ( form )
{
// ** Validate Salesperson Entry **
if (form.form_specialist.value == "") {
alert( "Please enter Salesperson Name" );
form.form_salesperson.focus();
return false ;
}
// ** END Salesperson Validation **
return true ;
}
//-->
</script>
Aaaaanyway - the problem is I can't figure out how to reject any names not in the 'Salesperson' table. For example - if I were to type 'Jaaames' although it would initially suggest 'James' if I were to ignore it and submit 'Jaaames' this would be entered into the 'Requests' table. This is relatively annoying given my undiagnosed OCD and I'd rather not have to go through hundreds of requests every so often editing them.
I'd say you're taking the wrong approach here.
The Requests table should NOT be storing the salesperson's NAME, it should be saving their ID. The Primary Key of the Sales Person table.
Then, instead of using auto-complete to populate a TEXT input, I'd recommend using the same approach to populate a SELECT menu that uses the Sales Person's ID as a value.
This accomplishes the following:
your database becomes more normalized
it removes redundant information from the Requests table
removes the need to validate the Sales Person's name on the client side
By defining the S_ID as a foreign key to the Requests table, you ensure that ONLY entries in the Sales Person table can exist in the Requests table.
You could try binding an AJAX request to either the submit of the form or on changing your text field or maybe when the field loses focus.
For this example I am using jQuery:
$('input[name=salesperson').blur(function(){
//when the text field looses focus
var n = $(this).val();
$.post('a_php_file_that_checks_db_for_names.php', {salesperson:n}, function(data){
//post the name to a php file which in turn looks that name up in the database
//and returns 1 or 0
if (data)
{
if (data==='1')
{
alert('name is in database');
}
else
{
alert('name is not in database');
}
}
else
{
alert('no answer from php file');
}
});
});
You would also need a PHP file for this to talk to, an example being:
if (isset($_POST['salesperson']))
{
//query here to check for $_POST['salesperson'] in the db,
//fill in the blanks :)
$yourquery='select name from db where name=?';
if ($yourquery)
{
//looks like there were results, so your name is in the db
echo '1';
}
else
{
echo '2';
}
}
A bit of filling in the blanks required but you get the idea.
Hope this helps you out
EDIT:
A second, more elegant solution just came to mind - if you could get the list of salespersons and make a hidden form field for each, you could read them all into a JS object and test against it whenever the form field is changed. Unfortunately I don't have the time to write you an example but it sounds like a nicer way of doing it to me.
It seems like you're just using Javascript to validate your input - this isn't good as it will never run if your user doesn't support or disables Javascript. As suggested above, a server side validation would be much easier to check against the database. However, client-side validation is also helpful to have as a sort of first line of defense against bad input, since it's generally faster. I can't think of a great way to do this, but one way could be to populate a PHP array of salespersons, convert it to a javascript array, and then check to see if the form value is in the array. It's probably faster (and substantially less code) to just use server-side validation here.
Try adding some sort of validation before you put it on your database? I mean, inside the script that puts the request into the table?
The mustMatch option isn't working for you? I see it commented out.
Also, your script is vulnerable to a SQL injection attack. I realize this is an in-house application, but you never know when crazy is going to show up and ruin your day. At the top of your get_salesperson_list.php, right after you retrieve the query from $_GET, you could add something like this:
if (!preg_match("/^\w+$/", $q)) {
// some kind of error handling here, or at least a refusal to fulfill the request:
exit;
}
UPDATE: Sorry, I meant to say "exit" instead of "return". I do see that your script wasn't in a function. I have edited the above to account for that. Thanks for pointing that out.
I want to have a form on my intranet site... basically we are a home improvement company and have a list of bad area codes that we do not do business in ... IE list of bad zips 19020 19021 etc are bad so if they are I want it to return with a popup which says bad area ... if it is not on the list I want it to say Good Area
You haven't given too much information, so what follows is a very general solution. One way to approach this is to have two maps called badZips and goodZips:
var badZips = {
"19020": true,
"19021": true
...
};
var goodZips = {
"90210": true,
...
};
Then in your form-validation function, you can do:
if(badZips[zip]) {
alert("You entered a bad zip code");
}
else if(goodZips[zip]) {
alert("You entered a good zip code");
}
else {
alert("That zip code is not recognized");
}
Actually creating the maps depends on how your webapp is set up. How do you store the zips - is it in the database? Or have you hardcoded it?
Using apache, install geoIP. Echo their zipcode into a javascript function, which compares to a black-list you created.
http://www.maxmind.com/app/ip-location
Your functional requirements are pretty simple but you didn't really mention what setup you have. Do you want this functionality to happen on a form? What are you going to code with? Do you have a database? Based on the tags you've used I'll just assume that you don't have a database.
Basically you can have a list of area codes and a flag for each to indicate if it's a bad or a good code. You can keep this list in a multi-dimensional array in PHP as static data (http://www.webcheatsheet.com/PHP/multidimensional_arrays.php).
So it might look something like:
<?php
$areaCodes = array( array('aCode'='19020','aFlag'=>true),
array('aCode'='19021','aFlag'=>true),
array('aCode'='19022','aFlag'=>false)
);
?>
When you need an area code to be validated, just do a search in the array and check the flag to see if it's a good code or a bad code.
Store the zip codes in an array, then check if the given zip is in the array.
<?php
$BadZip = array("19020", "19021");
if (in_array($Zip, $BadZip))
{
echo "Bad Zip code!";
}
?>
If in_array returns true, then the zip code is in the list of bad zips.
Alternatively you could use the same method with a list of good zips.