Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am in secondary school and learning web development and our latest school project was to come up with our own company based around a website.
Basically my website is going to display aspiring animator’s videos, there is going to be a place where other users of the website can comment feedback on these videos and there are going to be other resources for the animators to use.
I have already created the base of the website. I have placeholder youtube videos on the home screen (where the user’s videos would go) and I have a contact page and a resource page.
Basically, my teacher told me that if I wanted the website to actually function, that is to have a login system where users can go in and be able to post their own videos for the other users to see, (posting videos would most likely be in the form of submitting a youtube link, there the video would be displayed on the home page) and have a comment system for other users to be able to leave feedback on other user’s videos and so on, my best option was to use a CMS e.g. Drupal. I was unsure if this would be my best option, because as far as my research goes, I believe that CMS are made for users to use their web templates and it doesn’t work well for those who have already got a website coded. (unsure)
I am new to making websites but I am quite capable with a bit of learning. Basically, all I need to know is what method I should use to integrate this login system for users to be able to post and comment to my website and a way for an admin who would run the website to be able to manage the content on the website easily without having to change any of the code. Considering that I have already coded my website, I am unsure if this is possible and I do not have the time to start again.
Thanks for your help.
Actually I belive that it would be lot easier to simply take your coded website and convert it to template for one of the most popular CMS platforms (Joomla, for example). It would allow you to use thousands of free plugins (also for video uploading and galleries, for that matter), and will make your site LOT safer. It's lot faster than coding your own CMS too - if design is not very complicated and you don't have lot of functions, I belive it would take you few days max to install Joomla, find, add and configure few necessary plugins, and follow one of hundreds of tutorials about converting your HTML to Joomla template.
If you insist on coding your own CMS, start with this tutorial
https://css-tricks.com/php-for-beginners-building-your-first-simple-cms/
It's old, from 2009, but it covers most of the basics of working with simple databases, user login sessions, etc.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to secure my WordPress site from hackers - more specifically, individual non-content pages that appear to be getting more hits. I am using Siteground and installed WordPress a few months ago. Checking the website statistics I was taken aback by what I saw. I have briefly summarised the page hits below.
https:// ... /wp-admin/admin-ajax.php --> this page has been viewed over 16k times each month. Considering my site has been live for only 2 months, contains no SEO, and no-one knows about its existence this is odd!
https:// ... /index.php/wp-json/wp/v2/users/ --> this page is giving away my usernames.
https:// ... /index.php/wp-json/wp/v2/pages --> appears to display code from one of my main pages.
And a whole load of pages that appear very odd to be accessing:
https:// ... /index.php/wp-json/wp/v2/taxonomies
https:// ... /index.php/wp-json/wp/v2/categories
https:// ... /index.php/wp-json/wp/v2/taxonomies/post_tag
https:// ... /index.php/wp-json/wp/v2/taxonomies/category
https:// ... /index.php/wp-json/wp/v2/tags
https:// ... /wp-admin/load-styles.php --> shows blank screen
There's a whole load more, some redirects to the WordPress login page and others show a blank screen. There's also some URLs that allow any user to download a *.woff file (whatever that is?!).
Point is, I thought WordPress would be secure enough to not let these pages appear visible and show details at the very least.
Is there anything I can do? As I pointed out, I'm using Siteground which doesn't use cPanel.
I thought the most difficult part of a blog site is the content creation and overall web design. I'm not sure now.
Any help and/or advice would be greatly appreciated.
Thank you.
As for accessing login pages, that's to be expected with a WordPress site - bots love them.
You should have a strong password and use a nonstandard username for your admin-rights accounts. Bots will always access the default page with default login credentials to try it out. You could go another step and move the login page, too, that will massively drop accesses to the real login page, there's a plugin for it if you aren't comfortable coding that yourself: WPS Hide Login.
As for the wp-json URLs, you can ensure they are requiring logins / disabled with answers provided here, such as a plugin that disables it: Disable REST API.
Concerning the .woff files, those are just font files, either a bot scrapes over them or a user is accessing them to view the web page as it was designed; not a concern really.
WordPress has a decent article on additional things you can do to secure your website as well here.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am about to create an Online Shopping site for my one of the client. I have to make this site SEO Friendly and therefore I must have to understand few things before I proceed to make a custom CMS Based website.
As I said I am going to make a Custom CMS Based website so that my client will be able to add new content through CMS but I don't understand few things.
For Example: I have an index.php page which has many links to different products and all of these links are created through Database using PHP. Site Link like
http://www.def.com/shoes/Men-Shoes
My Questions:
1) I want to know that when the GoogleBot crawls my site, will it also open my dynamically created links and index them? Will GoogleBot also index the content of my dynamic links?
2) Do I have to create seperate pages for all of the products on site and store them on my server? Or just a single page which serves dynamically according to user query for every product?
I read this
"It functions much like your web browser, by sending a request to a web server for a web page, downloading the entire page, then handing it off to Google’s indexer."
is it right?
my above query was actually looking like this and I used .htaccess file to make it pretty
http://www.def.com/shoes.php?type=Men-Shoes
so is it right and google will crawl it to index?
SEO is a complex science in itself and Google is always changing the goal posts and modifying their algorithm.
While you don't need to create separate pages for each product, creating friendly URL's using the .htaccess file can make them look better and easier to navigate. Also creating a site map and submitting this to Google Via their webmaster tools will help them to know which pages to index.
GoogleBot will follow the links in your site, including dynamically created one, but it is important not to try and game the system using Blackhat methods if long term success is your aim.
Also, use social media (Twitter, Facebook, Google+) to help promote your brand and make sure you follow Google's guidelines with regards to SEO and inpage optimisation.
There is a huge amount of information on the internet on this subject, but be careful what advice you follow.
Google and other search engines index the dynamic links too. So a way to avoid duplicate content is to use the "Crawl"->"URL Parameters" tool in Google Webmasters. You can read more about how that works here https://support.google.com/webmasters/answer/6080548?rd=1. Set "Crawl" field to "No URLs". By this way you could hide from search dynamic links but you have to have a list of all of your dynamic links of your website/CMS in order not to hide important content accidentally. The "URL Parameters" feature is available in Bing Webmaster tools also http://searchenginewatch.com/sew/how-to/2195777/bing-webmaster-tools-an-overview#.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new to search engines, and I find googlenews very interesting.
I would like to write a simple crawler which
parse only the article links of three different news sites.
Save the links in database (mysql) with the timestamp in which the link has been advertised on the website (not the time in which the link has been detected by the crawler).
As you know, news website generate links on a daily basis (And I would like basically to parse all their links (not just those who are printed today, but also all the links that were generated before...and all these links are kept in the news website database).
I dont know which database is used by the news websites that I want to crawl and I also don`t have access permission to it.
So how does googlenews able to parse all the article links of all news sites, including the links which have been generated long time ago? Does googlenews have access to all those websites databases?
How does a crawler know that a NEW link has been added to the website? if for example, a news site posted a new article, and I want my crawler to parse the link immediately, how can the crawler knows that (googlenews also able to do it...so how...?) i.e does the crawler knows immediately about the new article link? or google just crawls the website on a fixed interval (every one hour etc...)?
How does google news crawler know when a new website has been launched?
Does the crawler looks automatically for new websites, or google engineers basically holds a fixed list of news website to crawl?
The same question can be asked regarding google search crawler i.e crawler should be aware that a new domain has been launched so it can crawl it and therefore make sure google database reflect the most updated state of the world wide web.
So is there any open worldwide database which keeps all the domains ever launched and google basically crawls it?
What will be the best tool to implement my news website crawler?
Apache Lucene, Nutch, Solr, ElasticSearch?
Maybe http://phpcrawl.cuab.de/?
I am REALLY curious to the answer of the above four questions.
Please assist.
Thanks in advance.
You have some key questions here which I'll answer but first you should understand what is a crawler.
What is a crawler?
The crawler's job is to scan the internet by reading a page, getting all the links he contains and then reading those pages as well. The main purpose of this action is to find new content automatically. A good crawler will start crawling few big and familiar websites that updates often, this way he can update and index these sites and also get new content and new sites fast (because big websites often contains links to other sites).
Regarding your questions:
Does googlenews have access to all those websites databases?
No, if you got access to the database there is no need for a crawler.
How does a crawler know that a NEW link has been added to the website?
Google crawls every site once in a while and searches for new links inside the site. Usually a new page or an article will be linked through the main page that already stored in Google's database.
How does google news crawler know when a new website has been
launched?
The simple answer is: the crawler finds a link to the new website, checks if the website is in the system and if not, adds it.
How they get the links of the old articles?
Easy, they save those links in a huge database. Google started crawling the internet years ago. Old links probably won't show up if Google will start crawling the internet today all over again.
How do I get the timing in which the site posted the article?
That's depends on the site you're crawling. If each article have a date you need to parse the page and extract this date. This article have a date in the top and it's easy to find the the HTML dom by searching the date class: <span class="date">6 June 2014</span>.
If the date does not appear, you won't have a way to know when they published it.
As a developer you can make the life of Google easier and ask Google to crawl your new website via Google Webmaster Tools.
While crawling the web, Google also counts how many links lead to a page, this will affect the page's ranking. Many links to your site will indicate you have a valuable content and you should appear higher in the search results.
Writing a simple crawler is easy. You get a page's content with php cURL or file_get_contents, parse it, select and save the data you want, extract all the links in this page and then recursively crawl the links you found.
This isn't a question of which I have no code but just a basic question to ask.
I know quite a lot of PHP and have begun writing web crawlers for certain projects and have wondered if there is a way to only crawl data in a certain area.
I am thinking about creating a sports-score type web app and i know some websites which keep the scores in a box on the right hand side, is there a way I could just crawl the data from that specific area and not the whole web page?
It was just a question
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
An application I am involved with is in dire need of a restructuring of the reports section... I am open to suggestions. All the development is currently in PHP (nginx/php/linux/mysql/redis environment), although other suggestions which fit in to the environment are welcome.
There is already ongoing logging for the current system which feeds into mysql tables. All tables are basically the same structure and different things logged with different logtypes.
There are a couple of different metrics/actions we'd like to report on, and be able to have the users drill down, by date or other filters.
Example metrics:
User searches a topic. I log his user id, search keyword, each result ID.
User accesses an item on the system (either from a search result above, or from my main page - i have separate logs for both). I log (currently) the "ID" of the page, the unique ID of the user (all users have an ID), the time, the Category of the page.
User submits a request for an item. I log the ID of the request (new id), the unique ID of the user, the Category of the report.
List all users who clicked on item X.
etc
Can someone give me some opinions on whether I would be able to leverage the existing functionality in Piwik (www.piwik.org) or Open Web Analytics (http://demo.openwebanalytics.com) to build an easy to use dashboard of sorts and report tool? The idea is that most if not all of the queries to insert and to select the data for the metrics above we already have. What we need is a uniform way of displaying the data, where the user can view different reports in a constant format, etc...
Filtering by category where we have a category ID would also be something necessary. Category is a hierarchichal tree and picking a parent node means we basically list all child nodes and make an IN (x,x,x) with all of the child IDs (we are investigating changing to linear tree traversal, but thats for another discussion...)
basically, once again, sorry if this has become confusing: from those who have experience with piwik/owa/other web analytic frameworks, have you used it to deliver custom metrics from custom applications, not related directly to webpage viewing?
If so, could you share examples?
Also, any reasons to favor piwik or owa? OWA seems to have some nice things which we could maybe add in the future like heatmaps and recordings, but the main focus right now is the custom metrics so the web metrics stuff would be disabled at first...
Thanks for the help...
Using Piwik with a mix of Custom Variables & Segmentation should allow for your requirements.