Featured Images

What Is Search Engine Optimization

What Is Search Engine Optimization

Search Engine Optimization is a process of choosing the most appropriate targeted keyword phrases related to your site and ensuring that this ranks your site highly in search engines so that when someone searches for specific phrases it returns your site on tops. It basically involves fine tuning the content of your site along with the HTML and Meta tags and also involves appropriate link building process. The most popular search engines are Google, Yahoo, MSN Search, AOL and Ask Jeeves. Search engines keep their methods and ranking algorithms secret, to get credit for finding the most valuable search-results and to deter spam pages from clogging those results. A search engine may use hundreds of factors while ranking the listings where the factors themselves and the weight each carries may change continually. Algorithms can differ so widely that a webpage that ranks #1 in a particular search engine could rank #200 in another search engine. New sites need not be “submitted” to search engines to be listed. A simple link from a well established site will get the search engines to visit the new site and begin to spider its contents. It can take a few days to even weeks from the referring of a link from such an established site for all the main search engine spiders to commence visiting and indexing the new site.

If you are unable to research and choose keywords and work on your own search engine ranking, you may want to hire someone to work with you on these issues.

Search engine marketing and promotion companies, will look at the plan for your site and make recommendations to increase your search engine ranking and website traffic. If you wish, they will also provide ongoing consultation and reporting to monitor your website and make recommendations for editing and improvements to keep your site traffic flow and your search engine ranking high. Normally your search engine optimization experts work with your web designer to build an integrated plan right away so that all aspects of design are considered at the same time.

The Keyword Density of Non-Sense – by DR. E. Garcia

On March 24 a FINDALL search in Google for keywords density optimization returned 240,000 documents. I found many of these documents belonging to search engine marketing and optimization (SEM, SEO) specialists. Some of them promote keyword density (KD) analysis tools while others talk about things like “right density weighting”, “excellent keyword density”, KD as a “concentration” or “strength” ratio and the like. Others even take KD for the weight of term i in document j, while others propose localized KD ranges for titles, descriptions, paragraphs, tables, links, urls, etc. One can even find some specialists going after the latest KD “trick” and claiming that optimizing KD values up to a certain range for a given search engine affects the way a search engine scores relevancy and ranks documents.

Given the fact that there are so many KD theories flying around, my good friend Mike Grehan approached me after the Jupitermedia’s 2005 Search Engine Strategies Conference held in New York and invited me to do something about it. I felt the “something” should be a balanced article mixed with a bit of IR, semantics and math elements but with no conclusion so readers could draw their own. So, here we go.

Background.

In the search engine marketing literature, keyword density is defined as

Equation 1

where tfi, j is the number of times term i appears in document j and l is the total number of terms in the document. Equation 1 is a legacy idea found intermingled in the old literature on readability theory, where word frequency ratios are calculated for passages and text windows – phrases, sentences, paragraphs or entire documents – and combined with other readability tests.

The notion of keyword density values predates all commercial search engines and the Internet and can hardly be considered an IR concept. What is worse, KD plays no role on how commercial search engines process text, index documents or assign weights to terms. Why then many optimizers still believe in KD values? The answer is simple: misinformation.

If two documents, D1 and D2, consist of 1000 terms (l = 1000) and repeat a term 20 times (tf = 20), then for both documents KD = 20/1000 = 0.020 (or 2%) for that term. Identical values are obtained if tf = 10 and l = 500.

Evidently, this overall ratio tells us nothing about:

1. the relative distance between keywords in documents (proximity)

2. where in a document the terms occur (distribution)

3. the co-citation frequency between terms (co-occurrence)

4. the main theme, topic, and sub-topics (on-topic issues) of the documents

Thus, KD is divorced from content quality, semantics and relevancy. Under these circumstances one can hardly talk about optimizing term weights for ranking purposes. Add to this copy style issues and you get a good idea of why this article’s title is The Keyword Density of Non-Sense.

The following five search engine implementations illustrate the point:

1. Linearization

2. Tokenization

3. Filtration

4. Stemming

5. Weighting

Linearization.

Linearization is the process of ignoring markup tags from a web document so its content is reinterpreted as a string of characters to be scored. This process is carried out tag-by-tag and as tags are declared and found in the source code. As illustrated in Figure 1, linearization affects the way search engines “see”, “read” and “judge” Web content –sort of speak. Here the content of a website is rendered using two nested html tables, each consisting of one large cell at the top and the common 3-column cell format. We assume that no other text and html tags are present in the source code. The numbers at the top-right corner of the cells indicate in which order a search engine finds and interprets the content of the cells.

The box at the bottom of Figure 1 illustrates how a search engine probably “sees”, “reads” and “interprets” the content of this document after linearization. Note the lack of coherence and theming. Two term sequences illustrate the point: “Find Information About Food on sale!” and “Clients Visit our Partners”. This state of the content is probably hidden from the untrained eyes of average users. Clearly, linearization has a detrimental effect on keyword positioning, proximity, distribution and on the effective content to be “judged” and scored. The effect worsens as more nested tables and html tags are used, to the point that after linearization content perceived as meritorious by a human can be interpreted as plain garbage by a search engine. Thus, computing localized KD values is a futile exercise.

Burning the Trees and Keyword Weight Fights.

In the best-case scenario, linearization shows whether words, phrases and passages end competing for relevancy in a distorted lexicographical tree. I call this phenomenon “burning the trees”. It is one of the most overlooked web design and optimization problems.

Constructing a lexicographical tree out of linearized content reveals the actual state and relationship between nouns, adjectives, verbs, and phrases as they are actually embedded in documents. It shows the effective data structure that is been used. In many cases, linearization identifies local document concepts (noun groups) and hidden grammatical patterns. Mandelbrot has used the patterned nature of languages observed in lexicographical trees to propose a measure he calls the “temperature of discourse”. He writes: “The `hotter’ the discourse, the higher the probability of use of rare words.” (1). However, from the semantics standpoint, word rarity is a context dependent state. Thus, in my view “burning the trees” is a natural consequence of misplacing terms.

In Fractals and Sentence Production, Chapter 9 of From Complexity to Creativity (2, 3), Ben Goertzel uses an L-System model to explain that the beginning of early childhood grammar is the two-word sentence in which the iterative pattern involving nouns (N) and verbs( V) is driven by a rule in which V is replaced by V N (V >> V N). This can be illustrated with the following two iteration stages:

0 N V (as in Stevie byebye)

1 N V N (as in Stevie byebye car)

Goertzel explains, “-The reason N V is a more natural combination is because it occurs at an earlier step in the derivation process.” (3). It is now comprehensible why many Web documents do not deliver any appealing message to search engines. After linearization, it can be realized that these may be “speaking” like babies. [By the way, L-System algorithms, named after A. Lindermayer, have been used for many years in the study of tree-like patterns (4)].

“Burning the trees” explains why repeating terms in a document, moving around on-page factors or invoking link strategies, not necessarily improves relevancy. In many instances one can get the opposite result. I recommend SEOs to start incorporating lexicographical/word pattern techniques, linearization strategies and local context analysis (LCA) into their optimization mix. (5)

In Figure 1, “burning the trees” was the result of improper positioning of text. However in many cases the effect is a byproduct of sloppy Web design, poor usability or of improper use of the HTML DOM structure (another kind of tree). This underscores an important W3C recommendation: that html tables should be use for presenting tabular data, not for designing Web documents. In most cases, professional web designers can do better by replacing tables with cascading style sheets (CSS).

“Burning the trees” often leads to another phenomenon I call “keyword weight fights”. It is a recurrent problem encountered during topic identification (topic spotting), text segmentation (based on topic changes) and on-topic analysis (6). Considering that co-occurrence patterns of words and word classes provide important information about how a language is used, misplaced keywords and text without clear topic transitions difficult the work of text summarization editors (humans or machine-based) that need to generate representative headings and outlines from documents.

Thus, the “fight” unnecessarily difficults topic disambiguation and the work of human abstractors that during document classification need to answer questions like “What is this document or passage about?”, “What is the theme or category of this document, section or paragraph?”, “How does this block of links relate to the content?”, etc.

While linearization renders localized KD values useless, document indexing makes a myth out of this metric. Let see why.

Tokenization, Filtration and Stemming

Document indexing is the process of transforming document text into a representation of text and consists of three steps: tokenization, filtration and stemming.

During tokenization terms are lowercased and punctuation removed. Rules must be in place so digits, hyphens and other symbols can be parsed properly. Tokenization is followed by filtration. During filtration commonly used terms and terms that do not add any semantic meaning (stopwords) are removed. In most IR systems survival terms are further reduced to common stems or roots. This is known as stemming. Thus, the initial content of length l is reduced to a list of terms (stems and words) of length l’ (i.e., l’ < l). These processes are described in Figure 2. Evidently, if linearization shows that you have already “burned the trees”, a search engine will be indexing just that.

Similar lists can be extracted from individual documents and merged to conform an index of terms. This index can be used for different purposes; for instance, to compute term weights and to represent documents and queries as term vectors in a term space.

Weighting.

The weight of a term in a document consists of three different types of term weighting: local, global, and normalization. The term weight is given by

Equation 2

where Li, j is the local weight for term i in document j, Gi is the global weight for term i and Nj is the normalization factor for document j. Local weights are functions of how many times each term occurs in a document, global weights are functions of how many times documents containing each term appears in the collection, and the normalization factor corrects for discrepancies in the lengths of the documents.

In the classic Term Vector Space model

Equation 3, 4 and 5

which reduces to the well-known tf*IDF weighting scheme

Equation 6

where log(D/di) is the Inverse Document Frequency (IDF), D is the number of documents in the collection (the database size) and di is the number of documents containing term i.

Equation 6 is just one of many term weighting schemes found in the term vector literature. Depending on how L, G and N are defined, different weighting schemes can be proposed for documents and queries.

KD values as estimators of term weights?

The only way that KD values could be taken for term weights

Equation 7

is if global weights are ignored and the normalization factor Nj is redefined in terms of document lengths

Equation 8

However, Gi = IDF = 1 constraints the collection size D to be equal to ten times the number of documents containing the term (D = 10*d) and Nj = 1/lj implies no stopword filtration. These conditions are not observed in commercial search systems.

Using a probabilistic term vector scheme in which IDF is defined as

Equation 9

does not help either since the condition Gi = IDF = 1 implies that D = 11*d. Additional unrrealistic constraints can be derived for other weighting schemes when Gi = 1.

To sum up, the assumption that KD values could be taken for estimates of term weights or that these values could be used for optimization purposes amounts to the Keyword Density of Non-Sense.

References

The Fractal Geometry of Nature, Benoit B. Mandelbrot, Chapter 38, W. H. Freeman, 1983.

From Complexity to Creativity: Computational Models of Evolutionary, Autopoietic and Cognitive Dynamics, Ben Goertzel, Plenum Press (1997).

Fractals and Sentence Production, Ben Goertzel, Ref 2, Chapter 9, Plenum Press (1997).

The Algorithmic Beauty of Plants, P. Prusinkiewicz and A. Lindenmayer, Springer-Verlag, New York, 1990.

Topic Analysis Using a Finite Mixture Model, Hang Li and Kenji Yamanish.

Improving the Effectiveness of Information Retrieval with Local Context Analysis, Jinxi Xu, W. Bruce Croft.


© Dr. E. Garcia. 2005

http://www.e-marketing-news.co.uk

Does Google Page-Rank Count Anymore? by Titus Hoskins

Does Google Page-Rank Count Anymore? by Titus Hoskins

Being a full-time SEM (Search Engine Marketer) I have been conditioned like Pavlov’s dog (not a pretty picture) to jump every time Google twitches. Lately Google has been doing a lot of twitching. Specifically, the rather startling news from Google Webmaster Trends Analyst Susan Moskwa that Google has ditched Page-Rank from Webmaster Tools.

“We’ve been telling people for a long time that they shouldn’t focus on Page-Rank so much; many site owners seem to think it’s the most important metric for them to track, which is simply not true,” states Moskwa. “We removed it because we felt it was silly to tell people not to think about it, but then to show them the data, implying that they should look at it.” (Source: WebProNews) Now, for SEO reasons or for ranking in Google’s index, Page-Rank has long been eunuchified by Google. However, even missing a few dangling bits, history has shown us, eunuchs still wheel tremendous power. Page-Rank is no different.

Regardless of what Google wants to happen, Page-Rank is still extremely important to anyone marketing on the web, especially if you’re selling SEO services or operating a web business. Try selling SEO services when that little green bar on your site is pointing to PR0 or worst yet, pointing to a solid gray bar. Obtaining a high PR7 or PR8 simply means more business and revenues… regardless of how Google is or is not using Page-Rank. People know how to count and they learned long ago, a ten is a lot more than a big fat zero. Placed against a PR1 site, a PR8 will win more respect in the eyes of potential clients and can produce enormous profits for the site owner and we won’t even mention the still widely practiced habit of selling links, which Google is desperately trying to stop. Total and full elimination of Page-Rank would be an honest start but it will still be an uphill, if not an unwinnable battle, for Google to fully eliminate link selling.

Even with my modest sites, I have turned down a small fortune by not selling text links on any of my sites. When I had a PR6 site instead of a PR4 – those link requests were nearly doubled. So one can easily understand Google’s position and the need to downplay Page-Rank, if they want to put even a small dent in all this link selling and buying, which is still running rampant on today’s web.

Page-Rank is Google’s creation, and unless they remove it fully from their system and the Google toolbar, then Page-Rank still Counts. Actually, in the whole scheme of marketing your website on the net, Page-Rank counts big time. And in more ways than one.

There are several reasons why you shouldn’t count Page-Rank out. For years Google has been downplaying the important of Page-Rank and states it’s only one of about 200 ranking factors which determines how Google ranks its index for keywords. Obtaining top organic rankings for popular lucrative keywords in Google simply means money in the bank. Actually, even a movement of only one or two places on those first page SERPs (Search Engine Results Pages) can make a major difference to any online marketer’s bottom line.

Now while you can have a lower PR number and still rank above other higher PR pages for your chosen keywords, I have even had many times when my PR drops but my actual SERPs rankings in Google goes up, mainly due to building related relevant backlinks. So Page-Rank counts little towards your keyword rankings but it can’t be totally dismissed.

Mainly because, even if PR is just one ranking factor, in close competitive keyword battles (I am presently fighting tooth and nail for some very choice keywords) just one ranking factor such as high PR can make the difference of whether or not you get to the top spot. Big dogs are still jumping and for those of us who know how to count, getting a number one spot in Google makes all the difference in the world.

Not only because Google controls roughly 80% of all search engine traffic, but more importantly Google has established unmatched credibility and brand recognition in the eyes of potential customers visiting your site. Web users trust Google. Web users look to Google for guidance and direction. Web users believe what Google is telling them. In the online world, rightly or wrongly, perception is everything.

As an online marketer, I am completely amazed each day at the marketing power Google now commands with web surfers and with the general population. Google is king of online search and no other search engine even comes close to Google.

Page-Rank is Google’s ranking system, and in the eyes of those who notice these things, it still wheels tremendous influence and power. By default, Page-Rank is Google’s opinion of your site, and web users can count (at least to 10) and if Google believes people are still not counting when it comes to Page-Rank, then they are fully mistaken.

Titus Hoskins is a full-time professional online marketer who has numerous niche web sites. For the latest web marketing tools try: Internet Marketing Tools or here: Free Guides 2009 Titus Hoskins.

Protecting Your Search Engine Rankings

Protecting Your Search Engine Rankings

Your website’s ranking on search engines is a vital element of your overall marketing campaign, and there are ways to improve your link popularity through legitimate methods. Unfortunately, the Internet is populated by bands of dishonest web masters seeking to improve their link popularity by faking out search engines.

The good news is that search engines have figured this out, and are now on guard for “spam” pages and sites that have increased their rankings by artificial methods. When a search engines tracks down such a site, that site is demoted in ranking or completely removed from the search engine’s index.

The bad news is that some high quality, completely above-board sites are being mistaken for these web page criminals. Your page may be in danger of being caught up in the “spam” net and tossed from a search engine’s index, even though you have done nothing to deserve such harsh treatment. But there are things you can do – and things you should be sure NOT to do – which will prevent this kind of misperception.

Link popularity is mostly based on the quality of sites you are linked to. Google pioneered this criteria for assigning website ranking, and virtually all search engines on the Internet now use it. There are legitimate ways to go about increasing your link popularity, but at the same time, you must be scrupulously careful about which sites you choose to link to. Google frequently imposes penalties on sites that have linked to other sites solely for the purpose of artificially boosting their link popularity. They have actually labelled these links “bad neighbourhoods.”

You can raise a toast to the fact that you cannot be penalized when a bad neighbourhood links to your site; penalty happens only when you are the one sending out the link to a bad neighbourhood. But you must check, and double-check, all the links that are active on your links page to make sure you haven’t linked to a bad neighbourhood.

The first thing to check out is whether or not the pages you have linked to have been penalized. The most direct way to do this is to download the Google toolbar at http://toolbar.google.com. You will then see that most pages are given a “Page rank” which is represented by a sliding green scale on the Google toolbar.

Do not link to any site that shows no green at all on the scale. This is especially important when the scale is completely gray. It is more than likely that these pages have been penalized. If you are linked to these pages, you may catch their penalty, and like the flu, it may be difficult to recover from the infection.

There is no need to be afraid of linking to sites whose scale shows only a tiny sliver of green on their scale. These sites have not been penalized, and their links may grow in value and popularity. However, do make sure that you closely monitor these kind of links to ascertain that at some point they do not sustain a penalty once you have linked up to them from your links page.

Another evil trick that illicit web masters use to artificially boost their link popularity is the use of hidden text. Search engines usually use the words on web pages as a factor in forming their rankings, which means that if the text on your page contains your keywords, you have more of an opportunity to increase your search engine ranking than a page that does not contain text inclusive of keywords.

Some web masters have gotten around this formula by hiding their keywords in such a way so that they are invisible to any visitors to their site. For example, they have used the keywords but made them the same colour as the background colour of the page, such as a plethora of white keywords on a white background. You cannot see these words with the human eye – but the eye of search engine spider can spot them easily! A spider is the program search engines use to index web pages, and when it sees these invisible words, it goes back and boosts that page’s link ranking.

Web masters may be brilliant and sometimes devious, but search engines have figured these tricks out. As soon as a search engine perceive the use of hidden text – splat! the page is penalized.

The downside of this is that sometimes the spider is a bit overzealous and will penalize a page by mistake. For example, if the background colour of your page is gray, and you have placed gray text inside a black box, the spider will only take note of the gray text and assume you are employing hidden text. To avoid any risk of false penalty, simply direct your webmaster not to assign the same colour to text as the background colour of the page – ever!

Another potential problem that can result in a penalty is called “keyword stuffing.” It is important to have your keywords appear in the text on your page, but sometimes you can go a little overboard in your enthusiasm to please those spiders. A search engine uses what is called “Key phrase Density” to determine if a site is trying to artificially boost their ranking. This is the ratio of keywords to the rest of the words on the page. Search engines assign a limit to the number of times you can use a keyword before it decides you have overdone it and penalizes your site.

This ratio is quite high, so it is difficult to surpass without sounding as if you are stuttering – unless your keyword is part of your company name. If this is the case, it is easy for keyword density to soar. So, if your keyword is “renters insurance,” be sure you don’t use this phrase in every sentence. Carefully edit the text on your site so that the copy flows naturally and the keyword is not repeated incessantly. A good rule of thumb is your keyword should never appear in more than half the sentences on the page.

The final potential risk factor is known as “cloaking.” To those of you who are diligent Trekkies, this concept should be easy to understand. For the rest of you? – cloaking is when the server directs a visitor to one page and a search engine spider to a different page. The page the spider sees is “cloaked” because it is invisible to regular traffic, and deliberately set-up to raise the site’s search engine ranking. A cloaked page tries to feed the spider everything it needs to rocket that page’s ranking to the top of the list.

It is natural that search engines have responded to this act of deception with extreme enmity, imposing steep penalties on these sites. The problem on your end is that sometimes pages are cloaked for legitimate reasons, such as prevention against the theft of code, often referred to as “page jacking.” This kind of shielding is unnecessary these days due to the use of “off page” elements, such as link popularity, that cannot be stolen.

To be on the safe side, be sure that your webmaster is aware that absolutely no cloaking is acceptable. Make sure the webmaster understands that cloaking of any kind will put your website at great risk.

Just as you must be diligent in increasing your link popularity and your ranking, you must be equally diligent to avoid being unfairly penalized. So be sure to monitor your site closely and avoid any appearance of artificially boosting your rankings.

The Importance of Referrer Logs

The Importance of Referrer Logs

Referrer logging is used to allow web servers and websites to identify where people are visiting them either for promotional or security purposes. You can find out which search engine they used to find your site and whether your customer has come from a ‘linked site’. It is basically the URL of the previous webpage from which your link was followed.

By default, most hosting accounts don’t include referrer logs but may be subscribed for an extra monthly fee. If your web host does not provide a graphic report of your log files, you can still view the referrer logs for your website by logging into the host server using free or low-cost FTP software, like these:
(links have been checked and are OK)

FTP Explorer:   http://www.ftpx.com/(costs around $40 but free trial available)
LogMeIn: https://secure.logmein.com/dmcq/103/support.asp (free trial available)
SmartFTP: http://www.smartftp.com/
FTP Voyager: http://www.ftpvoyager.com/
Filezilla: http://filezilla-project.org/ (the free one I use which is very good)
Ipswitch: http://www.ipswitch.com/ (which is a professional standard FTP solution which I also can recommend)

The log file is available on your web server which can be download into your computer later. You can use a log analysis tool, like those mentioned below, to create a graphic report from your log files so that the files are easier to understand.
(links have been checked and are OK)
Abacre Advanced Log Analyzer http://www.abacre.com/ala/
Referrer Soft http://www.softplatz.com/software/referrer/
Log Analyzer http://www.loganalyzer.net/

You can view the files using Word, Word Perfect, txt or WordPad files even if you don’t have the right tool. This information is very crucial to your business and marketing plans and is not advisable to neglect it.

In addition to identifying the search engine or linked site from where your visitor arrived, referrer logs can also tell you what keywords or keyword phrases your client used for searching.

As referrer information can sometimes violate privacy, some browsers allow the user to disable the sending of referrer information. Proxy and Firewall software can also filter out referrer information, to avoid leaking the location of private websites. This can result in other problems, as some servers block parts of their site to browsers that don’t send the right referrer information, in an attempt to prevent deep linking or unauthorized use of bandwidth. Some proxy software gives the top-level address of the target site itself as the referrer, which prevents these problems and still not divulging the user’s last visited site.

Since the referrer can easily be spoofed or faked, however, it is of limited use in this regard except on a casual basis.