I'm developing a store which gets its product info from lots of xml feed, I'll have maybe 3000 products in my database. I'll do it using a cronjob.
What I'd like to do is write posts, lets say a general post about picking the best TV set for yor family. Then I'd make a mysql match whitch should take the posts title and content and match it to the thousands of products in my database and retrieve the closest match to display on my post.
I'm thinking of this becouse having alot of xml with different nods, categories would be very hard for me to propely filter them using php.
Now, do you think thats a good ideea? content, performace wise?
Do you think mysql match could do it? Maybe use some other method?
Should I store all the product info like price, description, reviews in a single table field and use it for the mysql match?
Is there a better way I can do this?
Any ideea is very appreciated, I need to sort this out, make a plan before I start coding and waiting time.
What you are trying to do is awful with pure XML.
I strongly suggest you to leave this task to your Database in this case MySQL, basically your 3rd point.
With MyISAM table you can set up the full text search if you need a bit more complex query based on affinity.
Related
I would like to make full use out of MySQL for the purpose of a (web) application I have developed for a chiropractor.
So far I have been storing in a single row for [every year] for what are called progress notes. The table structure looks something like this (progress_note_id, patient_id, date (Y-0-0), progress_note). When the client wishes to append for the year of the current progress notes, he simply clicks at the top of a textarea (html), which I use TinyMCE JavaScript library, to make a new entry date along with the shorthand notes to go at the beginning of the column (progress_note). So far its been working ok, if there are 900+ clients (est.) there could potentially be 1300+ progress notes, for each year since the beginning of the application (2018).
Now the client wishes to be able to see previous progress notes (history), but is unable to modify any previous notes, while still be able to write new ones. The solution I have come up with is to use XML inside the textarea, and use PHP to decipher the new notes from the old ones.
My problem however is if I should have to convert my entire table from a yearly to a daily, that it could take up a lot of time and energy to convert multiple notes into each single rows, (est. 10x) Which could end up being 13,000+ rows. I realize that no matter what method I choose to do is going to be a lot of work. Another way around this perhaps I found was to use XML column type in MySQL to potentially store multiple records, and if I wish to append it, all I would need is PHP to interpret the entire XML and add a new child node, to the beginning. Each progress note is 255 - 500 chars. And in worst case scenario, if the patient was to be 52 times a year (1 for every week), there shouldn't be a large enough overhead.
Is this the correct way to solving this problem? I do wish to keep with MySQL DB and I realize that MySQL is not an intended for XML. And for some clarification, what I hope to accomplish is the same thing I intended to do with current progress notes, but with XML. I believe in ascending order (newer -> oldest).
<xml_result>
<progress_note>
<date>2020-08-16</date>
<content></content>
</progress_note>
<xml_result>
Thank-you for any of your time and for any suggestions.
Firstly, 13000+ is not a problem for mysql. In most case for web application, mysql can handle more than 10m+ records for a single instance with a good performance.
Secondly, you can use either XML or JSON format in a text field and handle the decoding in your application.
I'm working on a tool in PHP that scans Instagram to gather analytics on a bunch of hashtags. The aim is to monitor the evolution / growth of certain hashtags and provide a search engine for people to get up to date statistics on each hashtag.
So far I've got a fairly simple search engine in place, and I run a a SQL query that looks for LIKE %'travel'%. So if someone types "#travel", they'll get anything that contains the world "travel" such as "travelagent" "iliketotravel", etc.
The issue I'm facing is I'd like to broaden the search results to include things that are related to #travel, much like websites like http://displaypurposes.com or http://best-hashtags.com/ and I'm trying to figure out just HOW they do it.
I'm especially fascinated by the first one, and the Graph function: https://displaypurposes.com/graph?tag=travel
It looks like they've effectively mapped all the links between a huge number of hashtags and provide results based on that.
I have about 45 000 hashtags in my database, how would I go about linking them together to enable a "relevancy search" like the two websites I mentioned above? How does one go about building something similar? I've spent ages looking online and can't find the answer to my question.
Thanks for your help! :)
This isn't really a programming question but I'll try answer it in a way that addresses it in such a way.
It's possible to have multiple tags on a single Instagram post. For example, you might have someone posting a picture of Rome with the hashtags #rome #travel. This now associates #rome with #travel and counts this as a connection between the two.
As long as we have a table structure with the following attributes:
PostNumber
Hashtag
We can find the top relations by running something like the following code:
SELECT COUNT(*) `Relation Occurances`,
b.Hashtag
FROM
Posts a
JOIN
Posts b
ON
a.PostNumber = b.PostNumber
WHERE
a.Hashtag = '#travel'
AND
b.Hashtag != '#travel'
You can refine the query to limit to 100 top relations and so on if required.
To further expand on this, the key is splitting the post out into a table with 1 row per post per hashtag. If you're doing wildcard searches on large text, this will lead to long processing times and be inefficient.
I am looking for a way to store article in database in the same format as it would appear on the website. Could anyone here kindly explain how to store the entire article in database and later display the same on web? I suppose I need some mechanism to accomplish this. But I couldn't find any solution for the past 4 days despite extensive searching all over. Please help.
Storing html in a database is easy enough, but trying to store whole webpages is not the right way to do it...
Many WYSIWYG editors send html through for you to store, depending on your requirements typically you would store these in a VARCHAR to TEXT column in your (MySQL) database.
Simply insert the data into your database with a normal INSERT statement.
I have been doing a bit of searching round StackOverflow and the Interweb and I have not had much luck.
I have a URL which looks like this...
nr/online-marketing/week-in-review-mobile-google-and-facebook-grab-headlines
I am getting the article name from the URL and replacing the '-' with ' ' to give me:
week in review mobile google and facebook grab headlines
At this point this is all the information that I have on the article so I need to use this to query the database to get the rest of the article information, the problem comes around but this string does not match the actual headline of the article, this this instance the actual headline is:
Week in review: Mobile, Google+ and Facebook grab headlines
As you can see it include extra punctuation, so I need to find a way of using MYSQL LIKE to match the article.
Hope someone can help, a standard SELECT * FROM table WHERE field LIKE $name does not work , im hoping of finding a way of doing it without splitting up each individual word but if that what it comes down to then so be it!
Thanks.
Try MySQL MyISAM engine's full-text search. In your case the query will be:
SELECT * FROM table
WHERE MATCH (title) AGAINST ('week in review mobile google and facebook grab headlines');
That requires you to convert the table to MyISAM. Also depending on the size of the table, test the performance of the query.
See more info under:
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
This really seems more like a database design issue... If you're using large texts with different fields as forms of primary keys it could lead to duplicates or synchronization problems.
One potential solution is to give each entry a unique identifier (perhaps an int or uniqueidentifier field if MSQL supports that), and use that field to map the actual healdine to the URL.
another potential solution is to create a table that will associate each headline with its URL and use that table for lookups. This will incur a little extra overhead, but will ensure that special characters in the title will never effect the lookup process.
As for a way to do this with your current design, you may be able to do some kind of regular expression search by tokenizing each word individually and then searching for an entry that includes all tokens, but I'm fairly certain that MSQL doesn't provide this functionality in a basic command.
I was wondering if their was any sort of way to detect a pages genre/category.
Possibly their is a way to find keywords or something?
Unfortunately I don't have any idea so far, so I don't have any code to show you.
But if anybody has any ideas at all, let me know.
Thanks!
EDIT #Nican
Perhaps their is a way to set, let's say 10 category's (Entertainment, Funny, Tech).
Then creating keywords for these category's (Funny = Laughter, Funny, Joke etc).
Then searching through a webpage (maybe using a cUrl) for these keywords and assigning it to the right category.
Hope that makes sense.
What you are talking about is basically what Google Adsense and similar services do, and it's based on analyzing the content of a page and matching it to topics. Generally, this kind of stuff is beyond what you would call simple programming / development and would require significant resources to be invested to get it to work "right".
A basic system might work along the following lines:
Get page content
Get X most commonly used words (omitting stuff like "and" "or" etc.)
Get words used in headings
Assign weights to different words according to a set of factors (is used in heading, is used in more than one paragraph, is used in link anchors)
Match the filtered words against a database of words related to a specific "category"
If cumulative score > treshold, classify site as belonging to category
Rinse and repeat
Folksonomy may be a way of accomplishing what you're looking for:
http://en.wikipedia.org/wiki/Folksonomy
For instance, in Drupal they have a Folksonomy module:
http://drupal.org/node/19697 (Note this module appears to be dead, see http://drupal.org/taxonomy/term/71)
Couple that with a tag cloud generator, and you may get somewhere:
http://drupal.org/project/searchcloud
Plus, a little more complexity may be able to derive mapped relationships to other terms, especially if you control the structure of the tagging options.
http://intranetblog.blogware.com/blog/_archives/2008/5/22/3707044.html
EDIT
In general, the type of system you're trying to build relies on unique word values on a page. So you would need to...
Get unique word values from your content (index values or create a bot to crawl your site)
Remove all words and symbols you can't use (at, the, or, and, etc...)
Count the number of times the unique words appear on the page
Add them to some type of datastore so you can call them based on the relationships you're mapping
If you have a root label system in place, associate those values with the word counts on the page (such as a query or derived table)
This is very general, and there are a number of ways this can be implemented/interpreted. Folksonomies are meant to "crowdsource" much of the effort for you, in a "natural way", as long as you have a user base that will contribute.