mySQL string matching - php

I am needing to query my database to find records that fall between 2 dates (this is simple enough) however I need to refine the search so that it only finds records where the email falls within certain constraints, basically I need to delete any row that falls between 2 dates and has a format of
x.xxxxxXXXXX#xxxxxxxx.xxx
basically I need to look for email address that start with a letter followed by full stop and have 5 numbers before the # sign. Is this possible with mySQL and if so how, and if not how could I search for these email address with PHP?

You need to use regular expressions. MySQL 5.1 supports these: documentation page. This also can be done in PHP using preg_match.
You regurlar expression could look like: [a-zA-Z]\.[a-zA-Z]+[0-9]{5}#.+

You could also use like, if you find it useful. Example
LIKE '_.__________#______.___'
However, it will not detect numbers.
In that case you have to use regex
docs
Regex should be like this: (changed from example above)
[a-zA-Z]{1}\.[a-zA-Z]+[0-9]{5}#([_a-z0-9\-])\.[a-zA-Z]

Related

solr search with spaces not working properly

I'm using solr to search my data, and I used to search with the query like
title:*something here*
And it is working fine without spaces. But if I search like
seva sa
Even I have below
seva samithi
It is searching with either "seva" or "sa". Can anyone suggest me to do a proper way to search in solr.
You're performing a wildcard search. Wildcard searches are not analyzed, so they bypass the regular analysis chain - the only thing that will match "*sema sa*" is a single token that contain the whole string in the exact case. That would probably be either a StrField or a field indexed with a KeywordTokenizer instead of the regular tokenizer.
A better solution if you want to match any content within one of the words might be to use a ngramfilter, so that each token gets indexed in its shorter forms.
Use (seva sa) assuming you have OR some default operator
Try these tutorial
http://www.solrtutorial.com/solr-query-syntax.html
From what I saw from your query you are using a wild card, there are ways to search on solr because by default it uses a fuzzy system to search that is why it is ending up the result you are posting.

Format multiple SQL columns or do it in php

I have a database with many columns all with year names. Inside of them on every row are numbers with a type of integer. I want them to all have thousand seperators (A dollar sign would be nice but I can add that in easy with php).
-What I have now is the following:
SELECT *, format(`2015`, 0) AS `15`, FROM `FullList`
and that gives me the seperators like 1,000,000. The problem is I would have to do that for every column that seems wrong.
in my php I use this as simply
<div class=\"example\">$".$row[`15`]."</div>
Giving me $1,000,000
I'm hoping to find a good way of doing this in SQL or maybe even PHP so that I don't need to use format on every column.
Databases should not contain formatting because you are storing information in a standard form which can be read by any application regardless of language. I suppose the column is currently of data-type float(11,2) or something similar, which is correct and recommended.
Though it may seem tedious to add a dollar sign in front of every value, displaying a value should be the job of the templating language (in this case PHP) and not the database.
You might want to use PHP's money_format() instead:
http://php.net/money_format

Single regular expression that extracts a number from two different url formats?

I am trying to create a single regular expression that I can use to extract the number from two different urls in a PHP function. The format of these urls are:
/t/2121/title/
and
/top2121.html
I am bad at regular expressions and have already tried the following and many variants of it:
#^/t/(\d+?)/|/top(\d+?)\.html/#i
This is not doing anything and I am still at a complete loss after reading many sites and tutorials on regular expressions. Is there a regular expression I could create that would allow me to extra the number regardless of the url format entered?
Regex to extract only the digits while also checking if url matches accepted formats:
#^\/t(?:\/(\d+)\/[a-z_-]+\/?|op(\d+)\.html)$#i edit: captures in 2 groups
Explained demo here: http://regex101.com/r/dO5dI4
Variant #2: captures in the same group
#^\/t(?|\/(\d+)\/[a-z_-]+\/?$|op(\d+)\.html$)#i
Explained demo here: http://regex101.com/r/cG9vC3
if you just want the first digits after t regardless of the / between, something like this might work: #t/?(\d+)#i
edit:
example: http://codepad.viper-7.com/0z3ee0
I was able to get this regexp to match both types of url formats:
#^/(?:(?:t/)|(?:top))(\d+)(?:(?:\.html)|(?:/))#i
If anyone has a more efficient way of performing the same regexp, I would love to hear it.
If you got either one of these URL's you could use this expression. Your numbers should be stored in your second position:
#^/t(op|/)(\d+)(\.html|/.*)#i
Are there ever going to be numbers in the URL that you don't care about? If not, you can keep this simple by just capturing the numbers and ignoring the rest:
#(\d+)#

search query "alien vs predator"

How do you do so that when you search for "alien vs predator" you also get results with the string "alienS vs predator" with the "S"
example http://www.torrentz.com/search?q=alien+vs+predator
how have they implemented this?
is this advanced search engine stuff?
This is known as Word Stemming. When the text is indexed, words are "stemmed" to their "roots". So fighting becomes fight, skiing becomes ski, runs becomes run, etc. The same thing is done to the text that a user enters at search time, so when the search terms are compared to the values in the index, they match.
The Lucene project supports this. I wouldn't consider it an advanced feature. Especially with the expectations that Google has set.
Checking for plurals is a form of stemming. Stemming is a common feature of search engines and other text matching. See the wikipedia page: http://en.wikipedia.org/wiki/Stemming for a host of algorithms to perform stemming.
Typically when one sets up a search engine to search for text, one will construct a query that's something like:
SELECT * FROM TBLMOVIES WHERE NAME LIKE '%ALIEN%'
This means that the substring ALIEN can appear anywhere in the NAME field, so you'll get back strings like ALIENS.
When words are indexed they are indexed by root form. For example for "aliens", "alien", "alien's", "aliens'" are all stored as "alien".
And when words are search search engine also searches only the root form "alien".
This is often called as Porter Stemming Algorithm. You can download its realization for your favorite language here - http://tartarus.org/~martin/PorterStemmer/
This is a basic feature of a search engine, rather than just a program that matches your query with a set of pre-defined results.
If you have the time, this is a great read, all about different algorithms, and how they are implemented.
You could try using soundex() as a fuzzy match on your strings. If you save the soundex with the title then compare that index vs a substring using LIKE 'XXX%' you should have a decent match. The higher the substring count the closer they will match.
see docs: http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex

Finding correct php regex for this complex element

I'm trying to get a regex which is able to find the following part in a string.
[TABLE|head,border|{
#TEXT|TEXT|TEXT#
TEXT|TEXT|TEXT
TEXT|TEXT|TEXT
TEXT|TEXT|TEXT
}]
Its from a simple self made WYSIWYG Editor, which gives the possibility to add tables. But the "syntax" for a table should be as simple as the one above.
No as there can be many of these table definitions, I need to find all with php's preg_match_all to replace them with the well known <table> tag in html.
The regex iam trying to use for is the following:
/\[TABLE\|(.*)\|\{(.*)\}\]/si
The \x0A stays for a newline as my app is running on Linux this is enough (works fine with simpler regex).
I use the online regex tester on functions-online.com.
The matches it gets are not really usefull. And if i have more than one TABLE definition like the one above, then the matches are completely useless. Because of the (.*) it covers all from starting from "head,border" going to the very last "|" character in the second TABLE definition.
I would like to get a list of matches giving me the complete table command one by one.
This is because by default the .* will be a greedy match, assuming your code works correctly for an input containing only a single value. Placing a question mark after the two .*'s should prevent greedyness being an issue.
/\[TABLE\|(.*?)\|\{(.*?)\}\]/si

Categories