Crawl Depth, Crawl Budget, Sitemaps, & Interlinking

Ryuzaki

女性以上のお金
Staff member
BuSo Pro
Digital Strategist
Joined
Sep 3, 2014
Messages
2,880
Likes
5,269
Degree
7
#1
I've been spending a chunk of time this month getting some lingering site build things out of the way. This should mark my site moving to it's final form and I'll never have to think about these things again. But of course, opening one can of worms leads to another.

As I've been doing some 301 redirects and extending the length of some content, I began to think about interlinking since I was changing some links from old URLs to new ones. I ended up reading this post about Pagination Tunnels by Portent that was interesting if not necessarily useful.

You don't have to click the link to know what's in it. They set up a site with enough content to have 200 levels of pagination in a category. Then they tested several forms of pagination to cut the number of crawl steps down from 200 to 100, and ultimately to 7 using two types of pagination. One was the kind you see on some forums like this:


The other had the idea of a "mid point" number like this:

Where 12 and 113 are the mid points. These cut the crawl depth down to 7 leaps. That ends up looking like this:

But I'm guessing that Google's spiders aren't real thrilled about even 7 steps.

I don't plan on doing any crazy pagination tricks. I don't necessarily plan on changing much anything other than interlinking.

The reason for that is our sitemaps act as an entry point, putting every page at most 2 steps away from the first point of crawling. Would you agree with this? If you put your sitemap in your robots.txt, Bing and everyone else should find it.

But, for the sake of discussion and possibly enhancing our sites, let's say that the sitemap doesn't exist, and that there's no external backlinks, and you want to solve this the best you can with interlinking. Without interlinking some if not most pages are basically going to end up being orphans.

Do you think there's any benefit to ensuring every single page is interlinked contextually to another one at least one time, or is that just anal retentive thinking? Assuming every single post is optimized for at least one term if not a topic and could stand to bring in even the tiniest bit of traffic per month, is this even worth the bother?

Of course we intend to interlink more to the pages that earn the money. Are we harming the page rank flow by linking to every post once, or enhancing it? Assuming that once Google gets a post in it's index (we're pretending sitemaps don't exist here), it'll crawl those once in a blue moon. Interlinking should ensure each page is discovered and crawled again and again.

I'm not suggesting we make some absurd intertwined net of links. We'd still do it based on relevancy or link to the odd post from the odd post where there is no real relevancy.

The possibly benefit would be ensuring Google indexes the maximum number of pages possible, which will have sitewide benefits related to the size of the domain and the internal page rank generated. The downside is flowing page rank to less important pages a bit more.

Also, what do you suppose Google's crawl depth really is? How many leaps will they take from the starting point?

And finally, do you know of a spidering software that can crawl while ignoring specified sitewide links like navigation, sidebar, and footer, and category links after crawling and tell you which posts are essentially orphaned?

This is a pretty scattered post, not well thought out, but it should be worth talking about, especially for eCommerce and giant database sites.
 

turbin3

BuSo Pro
Digital Strategist
Joined
Oct 9, 2014
Messages
574
Likes
1,124
Degree
2
#2
You know, it's funny how the human mind works. Seemingly small things, like a smell, can transport your mind instantaneously back to some past experience as if it happened yesterday. In that same vein, this post just made me relive, in seconds, the past several years of work on a few massive sites. Funny how that works! :wink:

This is a subject close to my heart, and one I've spent the vast majority of my time trying to master over the past several years. In that time, I've waged a ruthless war against duplicate content. I've been battling crawl budgets like there's no tomorrow. I've also been struggling to maintain my own sanity.

One of the things I've put a significant amount of effort into over the past few years, is modeling bot crawl behavior across several of my sites. I'm talking big data scale, billions (literally) of rows of traffic log data. It's actually brought to light some REALLY interesting learning points. Mostly, it's just bolstered my paranoia about locking down site structure mercilessly, until there is only one true way for search bots to crawl. That's the dream at least. LOL


Not Everything is Relative
Dealing with extremely large sites is just a whole other ball game, the best practices of which often fall contrary to popular "Moz" fare. That's something most people just don't get. Some of what works on a 100 page blog, might actually be a TERRIBLE thing to implement on a 1M+ page site that has near-infinite possible crawl paths.

Anyways, I'll try to be concise, since I feel your pain. First off, I feel there's probably some sort of threshold at which certain on-page or domain-level factors have their ranking coefficients significantly reduced or elevated.

Take blog / WP taxonomy pages, for example. On a small site, it might make sense to allow the first page of your taxonomy pages to be indexed, while robots noindex/following the rest. It's trivial to make some template tweaks and even throw some supplemental content on those archive pages to enhance them.

I can see how people justify that. Even though the possible downsides may not matter a whole lot on a small site, I can say that it's something I will NEVER be doing again on any of my small sites. I just wouldn't be able to sleep at night, knowing there is actually a single, non-canonical, crawlable page on one of my small sites. Maybe this is what PTSD from big site SEO looks like. :confused:


Stripping Site Structure to the BONE
On large sites, I've pretty much come to the conclusion that it's probably best to physically remove ALL possible low quality crawl paths. This could be actual pages. In some cases, URL variations like query strings. Maybe from facets or other things.

Trying to "game" the blocking and get creative, like using JS links, doesn't work either. They'll still be crawled, trust me. If the entire <a> tag is generated with JS, it's probably gonna get crawled. If the <a> tag is hard-coded, and just the href is JS-generated, you can hardcode nofollow and that will help. Though, best to simply not have that pathway at all, if it's avoidable.

Think about the nature of stuff like that at scale. Like say I have a widget with a couple dozen links I don't want crawled. Now say that widget is sitewide or on a significant number of pages. Sure, you can do like the above, hardcode nofollow, etc. At scale it still means a percentage of on-page, nofollowed (link-juice-losing) links on a huge number of pages. This stuff can get to be a parasitic drain, in my opinion, at least on large sites where every little bit counts.

I'd really like to be able to define terms, to make more sense of this for everyone, but it's extremely tough. What is a large site? It's tough to say. I mean, at the point a site has 100K to 1M+ pages, yeah, that's a large site in my opinion. 10K pages? I'd still call that "large", if you think about the relative page volume compared to the average site (blogs, SMB sites, etc.). Some might not.

Concerns For Crawl Budget
As far as the "budget" goes, I think it's important to try and establish trends for the niche, to figure out what type of crawl behavior is desired or should be prioritized. Here's some of the major factors:

Crawl Frequency
  • Check competitor's indexed equivalent pages over time. Use Google's cache dates to get a sense of how frequently their core pages might be getting crawled.
  • Do some analysis on your own traffic logs. Take a subset of pages, like maybe a subdirectory, if that makes sense for your site. Notice any trend in how frequently the pages are getting crawled?
  • In my case, in at least one niche I noticed frequency was extremely important. Certain subsets of deep pages (like 5-7+ click depth) were only getting crawled once every 2 or 3 months. This was horribly affecting rankings, when some competitors pages had content updates almost daily, and crawl frequency every day or few days.
  • SERPWoo is a lifesaver for correlating crawl frequency behavior with ranking trends. This let's you actually identify ballpark behavior for the niche. Also helps understand just how frequently a set of pages needs to be crawled to rank in a stable manner. I've been able to correlate this across multiple SERPs, and see some very interesting behavior. Stuff like infrequent crawls leading to rankings that look like the EKG of someone about to flatline. Like declining, a page crawl, blip (YAY! Our page is catching it's breath.....oh wait now it's dying again!).

Crawl Depth

  • URL parameters, filters, facets, archives, date archives, protocol and subdomain variations, ALL need to be considered and accounted for on a large site. For example, spammers, affiliates, negative SEO'ers and others can just link indiscriminately to your site. Does your server setup account for these factors, or might there possibly be some gaps?
  • Imagine page sets generated based on URL structure or query string. A site might use this to try dynamically building pages to capture maximum users, offering default logic as a fallback. In essence, always providing some sort of default content at a minimum, to try and capture and retain the user. Search engine-type stuff.
  • Now imagine someone sees your site is setup in this way. If they're particularly mean, maybe they might build a bunch of links to made up URLs on your site, with payday loan or porn keywords. And whadda' 'ya know?! Google crawls them, and maybe even indexes some of them! Worse yet, say there's unaccounted for link control on page...
  • Maybe those generated SERP pages (that's effectively what they are) happen to have logic that generates supplemental links on page, based on the query. And maybe those link placements were overlooked before, and happen to not be nofollowed and not restricted by robots.txt. A day or two later, and maybe now you have 30-40K+ payday loan and porn SERP pages indexed, because they happened to find links with logic generating
    1. Original query string
    2. Query + US city == a brand new page!
Click Depth
  • On a few of my larger sites, I noticed at least 2 distinct crawl behaviors.
    1. High click depth pages ignored
      • There is logic behind Googlebot that prioritizes certain page types or site sections, while reducing importance of others. Exactly how that behavior is determined, is anyone's guess. Maybe it's a combination of on-page factors, user traffic and engagement metrics, or who knows what else. I'd bet other factors are history of page updates, as gathered by crawl history (maybe related to the tech behind the rank transition platform?).
      • What I haven't heard many people talk about is educated guesses about what site/page characteristics may or may not contribute to page sets falling into the lower importance category. I'm not even sure myself. Though, I want to say page quality + lack of traffic and/or lack of engagement are probably significant factors.
      • It's that thinking that's pushed me down the path, at least with certain large sites, towards consolidating and reducing page volume, to attempt boosting page quality and focusing traffic + UX on a smaller set of page in the hopes of positively affecting crawl behavior.

    2. High click depth pages crawled to their natural extreme
      • Surprisingly, I noticed this quite a bit from Bingbot. Like 10X crawl rate over Googlebot (Bill G. probably playing "catch up" hahaha).
      • Over the past 2-3 quarters, surprising and substantial increase in crawl rate by Yahoo's slurp bot. Didn't see a lot of logic to their crawl behavior. Guess they're trying to figure out what they're even doing now.
      • In several cases, massive drill down by Googlebot. Seemed to occur under some set of circumstances, though I could never quite figure out what. I'm guessing it was the appearance of the right, followable links, on the right page placement, such that they got prioritized and crawled to extreme depth.
  • Robots meta tag logic, use of internal nofollow, robots.txt, and other blocking methods are absolutely needed on, I would say, most large sites. Most people that say "you shouldn't" have likely never dealt with a large site before. It's just a different ballgame.
  • A major factor in reducing click depth is killing it with your category page game! :wink: I think about "categories" differently from the standard WP / blogger fare. If, for example, you have a huge number of pages under a category or sub-category, I would say you might look at creating MORE cat/sub-cat pages!
  • For example, say I have 200 pages in 1 cat. Maybe it makes sense to break that into 2+ sub-cats with ~100pg each? The thing with this is, it's far easier figuring out creative ways to get high-level internal linking, close to the homepage, from the cat/sub-cat level. Way harder to do with the low level. So maybe that helps take things from 7 levels deep, to 6, or whatever. I definitely believe reducing click depth on large sites is a significantly important factor to help improve crawl behavior.
TL;DR: Focus on the purple and green areas below, to reduce click depth ↓

Sitemaps
In my experience, there appears to be a "threshold" of sorts in behavior and usage surrounding sitemaps as well. I have some very real data behind this, and it's been interesting to say the least.

In one site optimization campaign, I was testing indexing and ranking of a large set of pages, to try and establish a baseline trend for a niche. I started with around 500K pages, so just over 50 sitemaps at max 50K URLs in each. This was unfortunately a volatile page set. For various reasons, these pages were not HTTP 200 and indexible 100% of the time. Some would lose content for various reasons, like user-generated content being removed. Based on the page logic, they had several conditionals when under a certain level of content:
  • Meta robots noindex/follow
  • Canonical to a more complete page on the topic
  • Different redirects under different conditions (301, 302, or 307)
  • Straight 404 in extreme cases
So if you think about that, we're already fighting crawl budget and frequency. We want to get our most valuable pages crawled consistently, and within at least a minimal frequency range normal for the niche. Maybe a volume like 500K pages well-exceeds those parameters?

In this case it did. So the result was, massive volumes of 404's and noindexed pages being crawled, and a smaller number of redirects. Whole 'lotta non-200 stuff going on! The result was, sporadic rankings, and poor crawl budget usage. Lots of budget burned up on non-existent stuff.

So I switched up the game. Chose a better quality set of pages that were more stable. Took a single sitemap of 50K or less pages and submitted those. What I noticed is, it seemed to take a bit of time for Google to "trust" the site again. A few weeks, maybe a little over a month. Though, the crawl rate and frequency definitely started increasing, and overall budget usage was MUCH more consistent. A nice, progressive, upward trend. Hell of a lot better than 50K pages crawled one day....then 100 pages crawled the next because there were 30K 404's the day before. LOL

So I say all that to say, don't rely on the sitemap alone. If we're talking large sites, like 100K+ pages, definitely don't rely on just the sitemap. Even 10K+ I'd say this still applies, but that's just me. I do believe it's important reducing click depth on site as well, and I don't think a sitemap alone is a complete solution for that. I'd suspect that part of the algorithm weighs internal links versus other discovery methods, and is a mix.

Internal Linking
  • This is EXTREMELY tough on large sites. You have multiple factors to consider:
    • Click depth
    • Internal anchor text distribution (definitely a thing, and so is anchor over-optimization for internal links)
    • Keyword over-optimization. I suspect, on large enough sites, this even gets down to the URL level. On at least one site, I've seen behavior where duplicated words within the URL seemed to be detrimental. Probably something related to TF*IDF and massive over-use of sets of words.
    • Expanding on the KW angles, I recently responded to a thread about possible over-optimization compounded by partial and exact match domains. Definitely a big factor to watch for on large sites.
  • Also, consider volume of internal nofollow and it's implications. For example, some sites might have no choice but to have certain blocked pages still accessible to the user and not bots (SERPs).
  • Now imagine crawlable pages having a significant percentage of nofollowed links to these blocked pages. You might find this on some ecommerce sites that are blocking parameterized URLs, but still using them as links for page content items.
  • Sure, you may be restricting crawling. But we know link juice is still lost from nofollow. So how much juice are those pages losing? In the ecommerce example, one option might be creating a new set of pages that can be crawled, and swap out the nofollowed links for those, so you get rid of a good percentage of those nofollow links across a large set of pages. New landing pages in essence.
  • Extremely large and complex sites absolutely should be using the "URL Parameter" tool within Google Search Console, to specify all their possible params. It can take a bit of setup if you have a ton of params, but it definitely helps. Bing Webmaster also has this feature ("Ignore URL Parameters" under Configure My Site).
 

turbin3

BuSo Pro
Digital Strategist
Joined
Oct 9, 2014
Messages
574
Likes
1,124
Degree
2
#3
So that was a lot, and a bit of a jumbled mess. :wink: I'm passionate about the subject, to be sure. Figured I'd do a recap and address your questions directly, so it's a bit easier.

But I'm guessing that Google's spiders aren't real thrilled about even 7 steps.
I would say probably not. On at least a few large sites, I've consistently seen that even 5 steps, for a significant percentage of pages, may be too much. I've tried to push things closer to 3-4 where possible, to help mitigate any possible effect.

The reason for that is our sitemaps act as an entry point, putting every page at most 2 steps away from the first point of crawling. Would you agree with this? If you put your sitemap in your robots.txt, Bing and everyone else should find it.
For a large site, I would disagree that the sitemap alone is sufficient for serving as a signal of a "low click depth" starting point. It can help, but I suspect the bias is still towards internal link click depth. Meaning, maybe you get it all in the sitemap. But if half the site is still 7+ levels deep, that might be enough that they still consider the deep stuff low priority regardless.

I've seen this reflected in my own efforts, and several millions of URLs worth of sitemaps at any given time. Based on my traffic logs, I get this feeling there's sort of a threshold or point of no return where, submitting past a certain volume of URLs is kind of an exercise in futility.

Do you think there's any benefit to ensuring every single page is interlinked contextually to another one at least one time, or is that just anal retentive thinking? Assuming every single post is optimized for at least one term if not a topic and could stand to bring in even the tiniest bit of traffic per month, is this even worth the bother?
I think there is, but I also think there's at least one other consideration. Namely, reducing page volume by consolidating pages, if it makes sense for the use case. I don't have any memorable data I can pinpoint, since I've been analyzing way too much at this point.

Though, I've analyzed enough in my own logs to suspect that, past a certain point, those few internal links you might build on a real deep and/or orphaned page, might not matter much if at all. So in some cases, reducing the depth, or the volume of pages at that depth, might be more effective.

Also, think about it programmatically, if we're talking about thousands of pages. I'd look at maybe creating a relational table of seed keywords. Maybe come up with some groupings, prioritized in some way. I'm still figuring out effective ways to do this at scale, while still targeting down to the topic-level. The idea is, take some consistent and identifiable element per page, and generate your internal linking with that. Stuff like "related searches", "related products", "Users also searched for", etc.

For example, with PHP, use a seed list of keywords and/or anchor variations combined with URLs. Then create a template for some set of pages, with a preg_replace against the list, and some parameters to randomize it. Like, if those phrases are found, generate a random number of links within a range.

I forget off-hand, but I remember doing this before based on a hash function or some damn thing upon the first generation. I think I was storing the hash in a table or something, maybe along with some other data. Then each additional page load would consistently generate those same exact links + anchors. Idea being, stability. One and done. :wink: That was a few years ago, so I don't remember unfortunately.

Of course we intend to interlink more to the pages that earn the money. Are we harming the page rank flow by linking to every post once, or enhancing it? Assuming that once Google gets a post in it's index (we're pretending sitemaps don't exist here), it'll crawl those once in a blue moon. Interlinking should ensure each page is discovered and crawled again and again.
As far as page rank flow, I'm honestly not sure.

With Googlebot, as far as I can tell there are at least 2 forms of it. One that crawls based on priority, high frequency change pages. Another that crawls much less frequently, like monthly.

If you think about the logic behind the rank transition platform, they've made fairly clear that parts of their systems do monitor page history, changes, and prioritize various functions and responses from it. So it stands to reason that, if they've determined a page is low priority, just because they crawl another page that links to it, they probably still prioritize the links on that page against some index of priority.

On at least one of my sites, I have seem some crawl behavior that would seem to support this. Stuff like pages that are crawled frequently, that I know for a fact have followed links to other pages that crawled more on a monthly basis.... And regardless of the first page being crawled every day, or few days, the second one is still only getting crawled every few weeks or month.

The possibly benefit would be ensuring Google indexes the maximum number of pages possible, which will have sitewide benefits related to the size of the domain and the internal page rank generated. The downside is flowing page rank to less important pages a bit more.
I question whether that's a thing anymore, if it is how much longer will it be, and/or whether there's a threshold beyond which it's no longer a thing. Let me explain.

I've actually been seeing evidence with several sites, large and small, where consolidating pages, reducing duplicate/thin content, and overall reducing page volume has been a net benefit. In one case, reducing pages on one site by several million. In others, reducing sub 1K page sites by 20-30%.

I think multiple factors are involved with some of the results I've been seeing. In some cases, I think aggregating the traffic and engagement metrics to fewer pages may be part of it. In others, maybe more technical like improvements in TF*IDF based on the consolidation, and maybe that leads to some increased trust, value, priority, or whatever.

Also, what do you suppose Google's crawl depth really is? How many leaps will they take from the starting point?
In the right conditions, whatever the logical extreme is. In one case, with a major sitewide change, TLS migration, and a few other things, it clearly triggered a priority with Googlebot. This resulted in a ~1,600% increase in crawl rate within a short period of time, and sustained for a little while.

Well, that's all well and good.....but it meant they found more internal link deficiencies FASTER, and just drilled down into them like a GBU-57 Massive Ordnance Penetrator. You think duplicate content sucks? Just wait until you see your index grow by several million pages within a few weeks. :wink: This coupled with page logic like keyword + city, or keyword + zipcode, is self-propagating and can quickly grow out of control if the page logic or DB table isn't rigidly controlled.

Under normal crawl behavior, however, I've seen enough to suggest that much beyond 3 clicks is probably not great. 5-7+ is probably a terrible long-term proposition, though I haven't been in every niche, so it might be totally different in others.

And finally, do you know of a spidering software that can crawl while ignoring specified sitewide links like navigation, sidebar, and footer, and category links after crawling and tell you which posts are essentially orphaned?
With extremely large sites, I haven't seen much that I've been truly impressed with. At least not to the degree that I felt fully satisfied by. I've used Deepcrawl, Ryte (formerly Onpage.org), Authoritas (formerly AnalyticsSEO), SEMRush, Screaming Frog, Xenu link Sleuth, and countless other services or apps.

I'd say, for 100K pages or less, many of those could serve most sites just fine. Stuff like 1M+ sites honestly demand a custom solution. No matter what I try, I always revert to building Scrapy bots. For the example you mentioned, you could combine Scrapy with the Beautiful Soup library and quickly do some cool stuff that gets the job done. In fact, I have a simple example here, in relation to crawling sites to get their content for purposes of determining TF-IDF weighting.

So in the case you mentioned, working with your own site, that could totally work. Use the logic in the example to drop all the structural elements you mentioned. Then BS4 has several ways to grab all the remaining hrefs in the body, or wherever you prefer.

From there, I can't think of the next step off-hand. You'd basically have those URLs thrown in a list, dict, tuple, or whatever, and then put them through other functions, middleware, or something. There might actually be an easier way purely with Scrapy, or a slightly different combination of that example.

Sorry about the lack of specifics. Normally, I tend to use source lists for scraping, to keep things controlled and consistent. Though I know there are some ways with the framework, to just set the entry point and let it go until some predefined "stop" point is reached (depth, path, volume, etc.).
 

Ryuzaki

女性以上のお金
Staff member
BuSo Pro
Digital Strategist
Joined
Sep 3, 2014
Messages
2,880
Likes
5,269
Degree
7
#4
I just wouldn't be able to sleep at night, knowing there is actually a single, non-canonical, crawlable page on one of my small sites.
This was your comment concerning noindex/follow sub-pages on categories, which I do, and I've spruced up Page 1 on them. Is this just because you don't like there being pages that aren't completely under your thumb and controlled, or because you'd rather index them? I can see having them indexed creating another point of entry for the crawlers that are referencing the index themselves. But I'm more worried about Panda eventually than I am with creating those entry points.

Though, I want to say page quality + lack of traffic and/or lack of engagement are probably significant factors.
This comment was about keeping sections crawled regularly. I'd guess that Google measures the things you've mentioned, but I'd guess that they also care about impressions in the SERPs. Because the size of the net is exploding way faster than the size of the population. Most pages won't ever see any reasonable amount of traffic, but Google wants them available just in case. It's how they eat. But for sure, I'm working on adding content to these pages I'm concerned with, which should boost traffic and engagement and impressions.

Most people that say "you shouldn't" have likely never dealt with a large site before. It's just a different ballgame.
I'd say it's important on every site unless it's a small 5-10 page business site. These naive webmasters using 500 different tags on 30 posts are creating a serious Panda problem that's really no different than auto-genenerated pages with boilerplate duplicate content.

On at least one site, I've seen behavior where duplicated words within the URL seemed to be detrimental.
Yeah, @ddasilva was talking about this recently. Another reason to not get a PMD or EMD. But you have to be careful with the names of your categories too. In the site I'm talking about, I have one instance like this that doesn't seem to be hurting the ranking abilities so far. That category has some #1 slots for some competitive info and buyer terms.
 

turbin3

BuSo Pro
Digital Strategist
Joined
Oct 9, 2014
Messages
574
Likes
1,124
Degree
2
#5
This was your comment concerning noindex/follow sub-pages on categories, which I do, and I've spruced up Page 1 on them. Is this just because you don't like there being pages that aren't completely under your thumb and controlled, or because you'd rather index them? I can see having them indexed creating another point of entry for the crawlers that are referencing the index themselves. But I'm more worried about Panda eventually than I am with creating those entry points.
To put some context behind it, I basically see category pages in a different manner from how they're often configured on many CMS' by default. Pagination, for example. Why have potentially dozens, hundreds, or more pages that, if they even get any traffic, it's just the odd click and 5 second visit before hitting another page?

There are often some creative ways with UI to display all on a single page. The real trick then becomes, organizing it, making it perform well (lazy load images for example), and providing useful nav, filter, facet functions for people to get to what they need. All those 5 second pageviews across all those other pages might then become minutes on one single page or a small set of similar pages.

I guess what I'm getting at is, I effectively want to turn my category pages into little "apps" unto themselves, instead of fragmenting that traffic and crawl budget across tons of low value pages. Those would then become pages more worth indexing in my opinion.
 
Joined
Sep 17, 2014
Messages
355
Likes
233
Degree
1
#6
I've built sites that maxed out around 100 pages and had this same concern, because I didn't have that many categories so posts were getting tucked back in the 4th and 5th level of pagination. I had done decent interlinking, but I still wanted to make sure the click depth wasn't too crazy.

So I used a HTML sitemap plugin and linked to it from the footer. This way every post was one click away from the homepage or any other page and the link power was spread around better.

I don't know if it helped or hurt, but that's how I solved it. That wouldn't work for a site with too many pages though. You'd have to create multiple HTML sitemaps and you'd end up creating a trap for spiders.