Faster Deployment Testing

Over the last year, we’ve occasionally been posting about page deployments from the old Lane site to the new Lane Drupal site. Today, we dramatically improved our testing infrastructure, and took a step towards a better, more accessible website.

Before we can explain that step, you need at least a grossly simplified idea of what it looks like for us to move pages. Essentially all we’re doing is manually copy and pasting over content from the old site to pages on the new site, then properly formatting and linking everything. Easy with one or two pages, but when doing a chunk of a hundred pages, it’s a tedious and time consuming process. And, as with any tedious process, it’s prone to errors.

So how to we ensure that everything worked right after a deployment? Our principle tool is a Link Checker, which is a type of Web Spider. A spider, once given a page to start on, follows all the links on that page, and then all the links on those pages, and so on. Eventually, it establishes a kind of web of a website.

When we first tried to find a spider, we found several that almost worked. The one we got furthest with was, appropriately enough, called LinkChecker. But it wasn’t quite right for our needs, and we had a lot of difficulty trying to extend it. So we did what any self respecting, overly confident programmer would do, and wrote our own.

Our first pass worked pretty well – checking some 13,000 pages and all their hundreds of thousands of links in about half an hour. But like any tool, before long, you want more power!

Tim Taylor, from Home Improvement, who often said 'More Power!'We experimented with adding some rudimentary spell checking, but found that doing a complete check of our site could take as long as 4 hours! And there were so many other things to add.

Over the last couple days, we’ve done a complete rewrite and realized significant speed gains. Instead of 30 minutes to check the site, it now takes 7. And adding in spell checking only makes it take 12. We also added a few other features:

  • Hotlinking Checks:
    It’s possible to include an image on your page that’s technically stored on another server. This practice is called “hotlinking” and is generally discouraged (some things, like Facebook icons or Google Docs embedded images are ok), since it can lead to awkward situations where the hosting server either removes or changes the image – effectively controlling content that’s displayed on your site. We’re now checking to make sure that all the images you include are local (or are part of a list of allowable sites)
  • Alt Tag Checking:
    According to accessibility rules, images are supposed to have alt tags to help visually impaired people identify what’s in the image. They’re really easy to add, so we really have no excuse not to include them every time we put an image on one of our pages. We’re now logging images that are missing alt tags.
  • External Link Checking:
    Previously, we were only checking links within the Lane website. We’ve now expanded that to check links off site.
  • Page Title Checking:
    It’s still in experimental mode, but we’re adding page title checking to make sure that none of our pages have redundant titles for SEO reasons.
  • Phone number formatting:
    We added a few checks to make sure that all phone numbers are formatted appropriately, so that you can click them to make a phone call.
  • Email address formatting:
    Similarly, we’re now making sure that all mailto: links have a properly formatted email address after them.

The broader, more important thing we did was to make sure the framework for our link checker is easier to extend – simplifying new tests in the future. And, because its so much faster, it’s “cheap” to run a test, meaning we can do them more often to catch problem areas sooner.

Now, on to the broader problem of actually fixing all the new problems on the site we’ve uncovered….

What types of Computers Visit?

Recently, the Webteam was tasked with formalizing the procedures we use for testing websites and applications for cross browser compatibility. When we put together our initial design, we tested against 17 different browser/operating system configurations in person, then looked at another few dozen configurations through browsershots.org. Thanks to all our testing, so far we’ve only had one browser issue – which is only in one version of Chrome, on one version of Windows, on one particular screen resolution. I’m pretty happy with that.

But browser popularity changes, and people are regularly upgrading their devices. It’s been over a year since we last launched a design. Charged with formalizing procedures or not, as we get ready to develop a new theme for the Lane website, it’s time to take a look at what devices are accessing our website.

For this data set, we’re actually only going to look at data from the last month – Oct 6, 2012 to Nov 5, 2012. We won’t be getting as complete a picture, but we also won’t see traffic from earlier this year, before some of the more recent browsers were even launched. We’ll start with a super high overview, then dig in deeper.

Browsers & Operating Systems

Popularity of different browsers at Lane

Popularity of differnet operating systems of people visiting the Lane website

So from this data we can conclude that the average (median) person visiting the Lane website is a Windows user with Firefox. Of course, we know that there’s a ton of different browser versions out there, so the story might be much more complex. But this data is still useful. Doing the same sort of extensive testing we do with the main website (millions of visits a year) with all of our sites (example: this blog, which sees about 4,000 visits a year) would be prohibitively time consuming and expensive. Testing on the latest versions of Firefox, Chrome, and IE on Windows, and then Safari on a Mac is probably good enough for most of our smaller sites.

But let’s go deeper.

Browser popularity broken down by version

Whoa! Sorry you can’t click the little down arrow and see all three pages of the key. I’m only looking at the top 100 browser versions, and I’m combining minor version numbers (treating Firefox 3.6.3 and Firefox 3.6.5 as just Firefox 3.6.x). That still leaves us with 39 different browser/version combinations.

Turns out that the average person is actually an Internet Explorer 9 user. And their operating system:

Operating Systems broken down by Browsers

As expected, most people use Windows. There’s no way to generate a graph of browser, version, and operating system (and it might be too complex anyway!), so we’ll have to conclude our analysis here. But we’ve learned a few things. For one, Safari on Windows, Chrome Frame, and Linux all aren’t important to us. And we’re probably ok to stick with our high level testing plan for basic sites – it’ll still cover a majority of users.

Screen Resolutions

Screen resolutions are usually listed as number of pixels wide by number of pixels tall. Pixels tall can be useful to know, as it helps us determine where the “fold” is. But at this stage we’re mostly concerned with width, since that more directly impacts layout. This used to be simple to do, but with the advent of tablets that rotate their orientation, screen width has become a little more fluid: 1024×768 and 768×1024 might actually be the same device. For the purposes of our analysis, we’ll just assume that tablets don’t rotate. As long as we’ve set our media queries appropriately, things should still work.

Screen widths of computers that visit Lane

There’s something else to consider here. A device that’s 1024 pixels wide will have trouble displaying a webpage that’s 1600 pixels wide. But there’s no issue the other way. So we’re really looking for a lowest common denominator. In this case, I’d submit that our lowest common screen resolution is 1024×768. It’s ancient, yes, but it’s also the standard resolution on most tablets.

In our case, we set our media queries up as:

  • Global: <740px
  • Narrow: 740px to 979px
  • Normal: 980px to 1219px
  • Wide: >1220px

Is this a good fit? I think so. It a content area gets to be too wide, it gets to be hard to read. So though it’d be totally justified to have a media query for browsers that are greater than 1400 px wide, I’d argue that it’d be hard to maintain the same look and feel without decreasing readability. And we should be ok with the tablet assumptions we made earlier – most should jump between narrow and normal view depending on orientation, but shouldn’t fall to a cell phone optimized view.

Java Support

28.38% of visitors reported no Java support. Although its a free plugin, we should probably seek alternatives whenever possible.

Flash Support

Although Google Analytics doesn’t provide us with a simple way to check Flash support, we can assume that “(Not Set)” is a reasonable guess at “Not Supported”. In this case, about 13% of our visitors don’t support Flash. Since 7-8% of our traffic is mobile, and we know that iOS devices don’t support Flash, this sounds about right. The lack of support we’re seeing here is a good reason for us to try to move away from Flash on our pages wherever possible.

Mobile Devices

Compared to this time last year, mobile traffic is up 50%. We’ve got to keep thinking mobile. What devices do people use? iPhones. Followed by iPads. In fact, the next 10 devices put together only add up to 1/3 the traffic we get from iPhones. For every Android device visiting a page, there were 2 iOS devices visiting that same page.

But Androids aren’t insignificant. In fact, more people visit via Android than people do via IE7. So let’s have a look:

Android verions at Lane

It looks Android 2.x hasn’t quite gone away, although we seem to have skipped Android 3. When testing phones, we should probably be sure to find a mix of Android 2.2, 2.3, and 4.0 devices.

What Now?

Now that we’ve drawn together all our data, its time to make some decisions about testing procedures. We’ll convene as a web team, go through all our data, and try to come up with a check-list. Except to see some further details soon!

How do visitors find us?

Last time, we analyzed where people go on the Lane website. This time, we’re going to wonder how they found the Lane website in the first place.

Sources of traffic to Lane's website

That’s right. This post has pictures.

We’ve analyzed our Search Traffic to death, and right now our Campaigns (targeted emails and such) aren’t a significant source of traffic. Direct traffic isn’t very interesting, since it’s just people that type our URL into the address bar of their browser (or are referred from an evil website that prevents us from seeing referral information – more on that later). But we can learn a lot from Referral Traffic, which we can define as visits from people who found us by clicking on a link on another website.

Of course, not all referral traffic is equally interesting. I exported the top 500 referral sites (anyone who has sent at least 52 visitors last year) to a spreadsheet, and went to work.

Search Traffic graph to Lane

The vast majority of our referral traffic to www.lanecc.edu is actually traffic from another one of our Lane websites – Moodle, MyLane, etc. So let’s ignore those for now – although unifying navigation is a goal, we can’t also do that in this blog post. While we’re at it, let’s ignore the 7% of referral traffic that came from search that Google Analytics didn’t identify. I’m being a little generous at what constitutes search – people that are squatting on a domain and putting a search bar on it to collect ad revenue should really count as “Bad People”, but we’ll pretend they’re legitimate search engines here.

We’ll also ignore AskLane, our Knowledge Base about Lane. It’s a useful tool, but people generally find it from a Lane website, so we shouldn’t count it as a distinct referrer.

Referral Traffic to Lane without traffic from Lane, Search, or AskLane

That’s better. It’s also not entirely accurate. The number of people who come to the Lane website via Email is likely much higher, but some email providers make it impossible to see that they were the referrer – even Google’s own email service. If you’re a department here on campus, and you’re thinking about sending out a mass email, come talk to us first – there’s things we can do to make it super easy to identify who’s clicking your email links.

Let’s look at each of the other categories one by one. Keep comparing each category against the overall averages – referral traffic averages 2.98 pages/visitor, about 4 minutes on site per visitor, with a new traffic rate of 23% and a bounce rate of 55%.

Social Referrals

Google Analytics actually provides a much more comprehensive interface for analyzing Social traffic. So we’ll augment our data set.

Social Traffic to Lane Graph.

I think that graph kind of speaks for itself. And forget about the other 13 Social Networks (including Tumblr, MySpace and Pinterest), who send so little traffic that they don’t even show up. But is there a difference in the quality of traffic from Social Networks?

Pages per Visit

I think it’s clear that yes, there is a difference. For pages per visit, we want the bar to be as high as possible – an indicator that the average visitor not only came to the site, but they then explored some other pages. In this case, the clear winner is Naver – a Korean Search and Social Networking Site. The biggest losers are Delicious and Facebook. So even though Facebook sends us 92% of our referral traffic, most of those visitors only view one page and then leave.

Bounce Rate for Social Traffic

We see a similar pattern in Bounce Rates (percent of people who view a page then leave. Facebook and Delicious have terrible Bounce Rates. Twitter (shown as t.co, since t.co is Twitter’s URL shortening service, and thus is the referrer) fares only a little better. But both Google Plus and LinkedIn tend to send us visitors that visit at least one other page. They also tend to send us more visitors that are new – 16% and 19%, respectively, compared to 9% for Facebook and 4% for Twitter.

Lessons? Facebook is so big that its impossible to ignore, but its also some of the worst traffic. Don’t ignore the other networks.

Government Referrals

Visitors from government websites have better than average statistics. Here’s the complete list:

Source Visits Pages / Visit Avg. Visit Duration % New Visits Bounce Rate
ltd.org 5642 2.24 0:02:48 8.53% 63.58%
oregon.gov 3354 4.24 0:03:37 58.86% 36.49%
stateoforegon.com 1780 3.65 0:02:53 78.71% 43.54%
eugene-or.gov 750 3.87 0:03:22 52.93% 47.07%
cms.oregon.gov 616 4.39 0:06:38 43.18% 29.87%
public.health.oregon.gov 539 4.02 0:05:20 63.45% 33.58%
ode.state.or.us 133 2.98 0:02:25 33.08% 51.13%
www1.eere.energy.gov 111 3.27 0:02:54 50.45% 63.06%
ci.florence.or.us 107 4.27 0:04:25 50.47% 38.32%
odccwd.state.or.us 103 2.73 0:02:35 59.22% 57.28%
egov.oregon.gov 94 6.2 0:05:09 24.47% 19.15%
nces.ed.gov 85 6.13 0:07:30 61.18% 23.53%
ci.corvallis.or.us 75 2.83 0:01:54 52.00% 48.00%
boli.state.or.us 63 3 0:02:35 26.98% 49.21%
ci.springfield.or.us 53 2.47 0:03:59 11.32% 47.17%
Averages: 900.3 3.75267 0:03:52 44.99% 43.40%

In the interest of not writing the world’s longest blog post, I won’t go into too much detail, but there’s a lot of questions to ask. Are visitors from www1.eere.energy.gov looking for info on our Energy Management program? Are the visitors from public.health.oregon.gov looking for Health Professions information? Knowing that, can we  do anything to help those visitors?

Educational Referrals

Once again, here’s some data:

Source Visits Pages / Visit Avg. Visit Duration % New Visits Bounce Rate
oregonstate.edu 846 3.02 0:05:52 10.05% 50.00%
utexas.edu 801 2.86 0:02:10 78.28% 55.43%
lcsc.edu 786 2.72 0:02:14 49.49% 57.76%
osba.org 567 3.52 0:02:18 73.37% 45.15%
ohsu.edu 387 2.34 0:02:56 23.77% 69.51%
umpqua.edu 333 3.47 0:03:59 22.52% 52.85%
nac.uoregon.edu 331 1.17 0:00:57 0.00% 91.54%
oregoncis.uoregon.edu 330 3.86 0:03:54 36.67% 47.88%
blogs.bethel.k12.or.us 327 3.46 0:05:34 44.34% 33.64%
uodos.uoregon.edu 306 3.78 0:14:27 3.27% 37.25%
Averages 247 2.8339 0:03:50 32.59% 53.51%

The averages don’t add up right, but that’s because I’m only showing the top ten here.

Why do some of these referrers have exceptionally high (78%) new visitor rates? Let’s dig deeper into traffic from utexas.edu. In this case, it turns out that the University of Texas maintains a list of Community Colleges around the country, and this is the source of almost all of the referrals. If we follow the visitor flow of those visitors, we get this:

Traffic from utexas.edu

Click the image to see a bigger version with legible text

So there’s people in Texas, looking at a list of Community Colleges, who click a link to our website. They look at some basic informational pages, and then go away. Do they come back later? (We get 29,000 visits a year from Texas, either as direct links or Search). Digging even deeper into this visitor flow, I can see that Texans are looking at many of our pages in Spanish. Should we further develop our translated pages?

Employment Referrals

Source Visits Pages / Visit Avg. Visit Duration % New Visits Bounce Rate
qualityinfo.org 1236 3.05 0:03:49 44.09% 58.33%
worksourceoregon.org 869 2.61 0:02:46 17.38% 64.67%
eugene.craigslist.org 794 2.46 0:03:01 19.90% 59.32%
univjobs.com 556 2.42 0:00:49 69.78% 15.47%
academic360.com 368 2.46 0:00:50 65.22% 11.68%
higheredjobs.com 170 5.04 0:02:44 83.53% 37.06%
chronicle.com 55 3.55 0:04:47 80.00% 54.55%
Averages 578.2 2.981 0:04:07 23.08% 55.00%

Having dealt with more than a couple hiring committees, I’ve always been curious to see what sites with job listings send traffic our way. Strictly speaking, the above is a pretty poor list. Most of our job postings link directly to the job post at jobs.lanecc.edu, so we may not see all the traffic in the above table.

College Guide Referrals

These are websites that attempt to guide you into picking a college, or provide listings of colleges with certain programs.

Source Visits Pages / Visit Avg. Visit Duration % New Visits Bounce Rate
oregoncollegesonline.com 945 3.05 0:03:44 40.21% 52.59%
technical-schools-guide.com 433 4.01 0:02:51 62.59% 37.18%
medicalassistantschools.com 245 3.84 0:03:04 47.76% 46.53%
flightschoollist.com 235 5.55 0:04:39 66.38% 35.74%
bestaviation.net 167 7.38 0:04:55 67.07% 14.97%
communitycollegereview.com 156 2.96 0:02:30 54.49% 48.08%
nursingschools.com 142 3.47 0:03:39 78.17% 35.92%
studyusa.com 129 5.81 0:06:46 70.54% 31.78%
a2zcolleges.com 123 3.93 0:02:50 75.61% 46.34%
justflightschools.com 108 6.55 0:04:16 76.85% 22.22%
communitycollegesusa.com 106 6.09 0:07:01 44.34% 31.13%
educationatlas.com 92 2.78 0:01:35 84.78% 50.00%
artschools.com 90 4.77 0:03:06 70.00% 21.11%
braintrack.com 90 3.58 0:03:19 51.11% 32.22%
universities.com 89 3.19 0:07:07 8.99% 48.31%
collegestats.org 87 3.98 0:03:42 81.61% 37.93%
collegeview.com 83 4.23 0:04:07 54.22% 26.51%
aviationschoolsonline.com 69 5.71 0:04:10 79.71% 20.29%
campusexplorer.com 63 5 0:05:43 55.56% 22.22%
uscollegesearch.org 63 4.95 0:02:58 80.95% 34.92%
community-college.org 62 3.82 0:02:08 87.10% 46.77%
collegesearch.collegeboard.com 55 4.58 0:03:03 74.55% 29.09%
 Averages 165.09 4.51 0:03:57 64.21% 35.08%

If this wasn’t already a record long post, we could try to compare programs – do referrals for flight programs differ from referrals for health professions programs? Is there something either of these programs could learn from each other to try to drive more traffic to their pages?

News Website Referrals

Source Visits Pages / Visit Avg. Visit Duration % New Visits Bounce Rate
coder.targetednews.com 359 1.37 0:00:36 9.19% 72.14%
special.registerguard.com 155 2 0:03:44 8.39% 67.10%
eugeneagogo.com 80 2.22 0:01:48 21.25% 62.50%
eugenedailynews.com 67 2.78 0:04:07 10.45% 56.72%
oregonlive.com 57 1.98 0:00:46 45.61% 59.65%
myeugene.org 55 7.62 0:21:53 9.09% 34.55%
Averages 128.8 2.995 0:05:29 17.33% 58.78%

Here we can ask questions like: why do visitors from myeugene.org spend so long on the website?

Everybody Else

The last 18% is the hardest to figure out. Some are sites like tiki-toki.com, where we host some time lines, but aren’t really Lane webpages. Others simply don’t resolve to anything any more – it’s like that website isn’t on the Internet any more. And others appear to have been proxies that purposely disguise the referrer. But if I cut out ones that really seem to be irrelevant, and we look at the top 30, here’s what we get:

Source Visits Pages / Visit Avg. Visit Duration % New Visits Bounce Rate
nwaacc.org 1366 1.83 0:01:41 51.54% 63.98%
planeteugene.com 1179 2.61 0:01:57 39.78% 32.99%
en.wikipedia.org 1167 3.8 0:02:42 68.98% 43.70%
oregonchildcare.org 712 3.09 0:05:40 19.94% 38.06%
nweei.org 497 3.71 0:03:28 32.39% 44.27%
osaa.org 494 1.75 0:01:30 75.10% 74.90%
windustry.org 456 2.69 0:01:58 87.94% 58.99%
artshow.com 437 2.28 0:01:03 72.77% 55.61%
ocne.org 434 4.35 0:04:32 38.48% 34.10%
oregon.ctepathways.org 397 8.16 0:11:31 12.59% 40.55%
mypathcareers.org 372 4.95 0:03:52 47.58% 23.66%
capteonline.org 343 5.14 0:03:43 74.93% 11.08%
racc.org 332 2.57 0:01:14 57.23% 31.33%
league.org 263 3.09 0:04:12 37.26% 50.57%
peacehealth.org 263 2.19 0:02:20 15.59% 69.58%
adha.org 227 6.03 0:04:21 68.28% 29.96%
apta.org 214 5.78 0:04:58 71.96% 10.28%
maps.google.com 183 4.58 0:03:30 59.02% 39.89%
mtai.org 182 2.15 0:02:35 61.54% 63.74%
flashalerteugene.net 178 2.55 0:03:58 7.30% 67.42%
ratemyprofessors.com 176 2.91 0:02:10 28.41% 58.52%
florencechamber.com 161 4.27 0:04:35 44.72% 37.27%
lanecountyseniornetwork.com 156 1.93 0:02:07 25.00% 69.87%
4cnas.com 154 1.66 0:02:40 11.04% 78.57%
degreedays.net 152 2.55 0:01:31 90.79% 53.95%
startatlane.com 136 2.63 0:03:11 13.97% 57.35%
healthguideusa.org 134 3.2 0:02:25 79.10% 46.27%
wadsworth.com 134 1.18 0:00:22 85.82% 90.30%
kpflight.com 131 2.35 0:01:46 12.21% 41.98%

Lots of questions spring to mind. Our stats from Wikipedia are pretty good. But when’s the last time anyone cleaned and updated our page? (Answer: May 19th, 2011, when a paragraph was added about the Longhouse , if you don’t count the two pictures added a few weeks back) Our stats from Google Maps are also pretty good, which I’m happy to see – last year I spent a couple days improving our campus map on Google Maps using their mapmaker tool. But since those maps are community edited, is our listing still ok? Are there things we’re missing?

Also interesting is the number of websites that simply list colleges offering a degree. For example, 90% of visits from degreedays.net are new, and are probably people interested in Energy Management. Can we do anything with this data? Are each of those sites linking directly to the program? (degreedays.net does) Do we need to ask any of those sites to update their links, since we’re changing our url structure? (In this case, no – and this is actually a complex question. Email me if you’re concerned).

Conclusions

Phew. That was a lot of data. I’ve only been processing it for a couple hours and I already have dozens of questions to try to answer. But I think its all important stuff to keep in mind as we continue to refine our information architecture. Are we keeping the needs of those 300,000 referral visitors in mind? If we come back to this data in another year or two and look at our bounce rates, will they have gone down, because interesting information is easier to find due to our new Information architecture? Are there any opportunities on any of those sites to increase the number of referrals?

As always, let us know if you’d like to sit down and look through some more complete data for your department.