how to parse data already stored in database - PHP

how to parse data already stored in database - PHP - php

I have a form users fill out that has one textarea. This gets stored in the database.
Now I want to sift through the data collected, get meaningful information out (like title) and store those in new fields in the database.
I know how to get the data out and store it in a variable. I'm not sure what to do next. I need to search the data for specific words like "title" and "instructions" and then focus in on everything after that word until a return was entered. Where should I start with this?
UPDATE
I really like the help so far, the expression match and preg_match_all / preg_match seem to get me started. One last question - I get how to find the start of the search, but how would I find the end? Usually the end will be denoted by a return (enter was pressed)

If there's any consistency to the text you can use a regular expression.
Otherwise you may want to update the form. Having to rely on consistent user input in a text field is iffy at best.

First of all, get data to a variable, then search in this variable to extract your data. The best way to do this is with Regular expression and with php functions preg_match and preg_match_all
EDIT:
For your second question you can search with regular expression to find \r and \n characters.

you can achieve this by using Explode() function.
Split the fields by using the separator. After converting the string to array, then you can parse the array again to get the Heading and value.

preg_match_all would be a good starting point.

Related

Sorting user input

I attempting what I thought would be a simple exercise, but unless I’m missing a trick, it seems anything but simple.
Im attempting to clean up user input into a form before saving it. The particular problem I have is with hyphenated town names. For example, take Bourton-on-the-Water. Assume the user has Caps lock on or puts spaces next to the hyphens of any other screw up that might come to mind. How do I, within reason, turn it into what it’s meant to be?

You can use trim() to remove whitespace (or other characters) from the beginning and end of a string. You can also use explode() to break strings into parts by a specified character and then recreate your string as you like.

I think the only way you can really accomplish this is by improving the way the user inputs their data.
For example use a postcode lookup system that enters an address based on what they type.
Or have a autocomplete from a predefined list of towns (similar to how Facebook shows towns).
To consider every possible permutation of Bourton On The Water / Bourton-On-The-Water etc... is pretty much impossible.

How to get part of text, string containing a phrase/word?

I have a database in MySQL that i process with PHP. In this database i have a column with long text. People search for some phrases with the search tool on the website. It displays items matching this search.
Now, my question is how to get a part of the text that contains the phrase they search for so that they can see if it's what they looking for?
For example:
Text: "this is some long text (...) about problems with jQuery and other JavaScript frameworks (...) so check it out"
And now i would like to get for phrase jQuery this:
about problems with jQuery and other JavaScript frameworks
?

In MySQL, you can use a combination of LOCATE (or INSTR), and SUBSTRING:
SELECT SUBSTRING(col, LOCATE('jQuery', col) - 20, 40) AS snippet
FROM yourtable
This will select a 40 character snippet from your text, starting 20 characters before the first occurrence of 'jQuery'.
However, it tends to be slow. Alternatives worth looking into:
Using a similar method in PHP
Using full-text search features from MySQL (not sure if any of them will help), or maybe a whole separate full-text search engine like solr.

You can use the PHP function strpos().

You can use strpos() to find the first occurrence of the phrase you a looking for. Then do a subtraction backwards to get a number less than yo first occurrence. Then call mb_strimwidth(). Here is an example code
we will search for the word 'website'.
//Your string
$string = "Stackoverflow is the best *website* there ever is and ever will be. I find so much information here. And its fun and interactive too";
// first occurence of the word '*website*' - how far backwards you want to print
$position=intval(strpos($string, '*website*'))-50;
$display_text=mb_strimwidth($string, $position, 300, '...more');
//we will print 300 characters only for display.
echo $display_text;
Like a boss.

Is it possible to generate strings that match a regular expression string?

Is it possible to display the strings that match a regular expression?
Example:
Take the expression /^AD\d{3}/
and display AD999
What I'm doing is validating a string that is pretty simple either containing all numbers, a few characters maybe, and maybe a '-'. I am validating a postal code on form submit against a database of all countries that use a postal code.
I could perform it in Javascript or PHP, if that makes any difference.

No. That sort of feature is not available.
You can try to implement it yourself, but I don't think that's the solution for you. Simply write the messages normally. Not everything must always be dynamic.
I like your way of thinking though.

It is possible. The developers of PEX figured it out.
Don't get your hopes up, I don't know of any javascript implementation.

There is one for javascript now: http://fent.github.io/randexp.js/.

I have understood your problem a little better from your additional comments.
Since your data is only postal codes, I suggest that it would possible to work in the other direction and store a picture in the database and automatically generate a regex from that.
For instance, UK postcodes look like AA?99? 9AA | AA?9A 9AA which is easily converted to a regex (using a regex!).

remove similar characters that appear in all rows

So I have a table with two columns "title" and "url". The rows go as such:
Title url
Galago - Wikipedia http://en.wikipedia.org/wiki/Galago
Characteristics - Wikipedia http://en.wikipedia.org/wiki/Galago
Classification - Wikipedia http://en.wikipedia.org/wiki/Galago
Myst- Gamestop http://www.gamestop.com/ds/games/myst/69424
Plot- Gamestop http://www.gamestop.com/ds/games/myst/69424
my question is, how would I remove the common characters that are present in all rows from a certain url (remove - Wikipedia from the first three, and - Gamestop from the other 2). This is just a minor example....I have many other rows that have the same pattern (they have common characters, words, that reoccur in all of the rows from a certain url). I wanted to add that I store these values from a javacript array

If all of your strings are in the format shown above for the title column, I think the best approach may be to apply a regular expression to the title before inserting into the database table. This regular expression could capture all data preceding the "-" character and discard the "duplicate" data succeeding the "-".
Info on regular expressions on strings in PHP can be found here: http://php.net/manual/en/function.preg-match.php

I think that most automated solutions to this risk removing data that you want to keep. A word or phrase that occurs on more than one row is not necessarily redundant. A couple of potential, but still unreliable, methods come to mind. These would work only if you are looking for whole words.
Read all the titles into an array, and create a wordlist array by splitting each title into words. You can then determine the frequency of each word, and use that information to remove the unwanted words from the titles. If you have a lot of data, this method could use a lot of memory...
Parse each URL, extract the hostname, split it using a period (.) As the delimiter, and then search for and remove occurrences of those strings from the title. You might choose to create a whitelist of strings to ignore, like www, com, co, uk, net, org, and so on. This method may work if the unwanted words are found in the domain name (as in your examples).

You could normalize out the url info into another table...so like take the url column and make it url_id and create a url table that provides a url column and a title column. Title would be like Wikipedia or Gamestop etc. Then in the original table store the title with just the title not including the url title.
Maybe that won't work very well with the queries you are trying to do, but in that way you could probably search by url, url title, or title or any combination of those pretty easily.

Best way to parse a text document

I'm trying to parse a plain text document in PHP but have no idea how to do it correctly.
I want to separate each word, assign them an ID and save the result in JSON format.
Sample text:
"Hello, how are you (today)"
This is what im doing at the moment:
$document_array = explode(' ', $document_text);
json_encode($document_array);
The resulting JSON is
[["Hello,"],["how"],["are"],["you"],["(today)"]]
How do I ensure that spaces are kept in-place and that symbols are not included along with the words...
[["Hello"],[", "],["how"],[" "],["are"],[" "],["you"],[" ("],["today"],[")"]]
I’m sure some sort of regex is required... but have no idea what kind of pattern to apply to deal with all cases... Any suggestions guys?

This is actually a really complex problem, and one that's subject to a fair amount of academic reaserch. It sounds so simple (just split on whitespace! with maybe a few rules for punctuation...) but you quickly run into issues. Is "didn't" one word or two? What about hyphenated words? Some might be one word, some might be two. What about multiple successive punctuation characters? Possessives versus quotes? etc etc. Even determining the end of a sentence is non-trivial. (It's just a full stop right?!)
This problem is one of tokenisation and a topic that search engines take very seriously. To be honest you should really look at finding a tokeniser in your language of choice.

Maybe this:?
array_filter(preg_split('/\b/', $document_text))
the 'array_filter', removes the empty values at the first and/or last index of the resulting array, which will appear if your string start or ends with a word boundary (\b see: http://php.net/manual/en/regexp.reference.escape.php)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

how to parse data already stored in database - PHP - php

If there's any consistency to the text you can use a regular expression. Otherwise you may want to update the form. Having to rely on consistent user input in a text field is iffy at best.

First of all, get data to a variable, then search in this variable to extract your data. The best way to do this is with Regular expression and with php functions preg_match and preg_match_all EDIT: For your second question you can search with regular expression to find \r and \n characters.

you can achieve this by using Explode() function. Split the fields by using the separator. After converting the string to array, then you can parse the array again to get the Heading and value.

preg_match_all would be a good starting point.

Related

Sorting user input

How to get part of text, string containing a phrase/word?

Is it possible to generate strings that match a regular expression string?

remove similar characters that appear in all rows

Best way to parse a text document

Categories

Resources