Technical – Lane Community College Web Team

IE9 & 10 Support Discontinued

On January 12th, 2016, Microsoft will be discontinuing support for all versions of Internet Explorer earlier than version 11. You can read the announcement from Microsoft, as well as why and how to upgrade, here. If you run Windows at home, it’s very important that you keep your computer updated in order to help keep it safe.

Similar to when we stopped supporting Internet Explorer 8, if you choose to continue using an outdated version of Internet Explorer, it does not mean the Lane website (or most of the rest of the Internet) will suddenly stop working for you. It just means that out of the hundreds of different browser, operating system, and screen resolution combinations that we try to test, we’re no longer going to test with IE8, IE9, or IE10 – we’ll be doing our Internet Explorer testing strictly with IE 11. Since users of the all of the outdated versions of Internet Explorer combined represent less than 4% of our web traffic, most folks won’t even notice we’re doing anything different.

If you use Internet Explorer at work, you may not be able to upgrade due to administrative policy or because you use an application that requires an older version of Internet Explorer. In that case, consider using multiple browsers, where you use Internet Explorer strictly where you have to, and Firefox or Chrome everywhere else, in order to limit the number of sites Internet Explorer is exposed to.

Encrypt all the things!

Over the last few weeks, we’ve been battling a problem where the web server would sometimes forget its own name. Some days it would want to go by www.lanecc.edu, other days it would want to go by 163.41.113.43 (our public IP address), and other days it would use our internal server name. We gave it a stern talking to, but it refused to cooperate.

The solution is to specify Drupal’s base_url variable. Normally, Drupal tries to identify what server name to use and it does a pretty good job. But clearly our server isn’t so great at that anymore. Specifying the base_url forces Drupal to use what we tell it.

But the base_url needs to be a full URL, complete with protocol. So it needs to be “http://www.lanecc.edu” or “https://www.lanecc.edu” – we’re not allowed to just say “www.lanecc.edu”. Why does that matter? Because even though the difference is just one letter – an “s” – that turns out to be one of the most important letters on the Internet.

HTTP is the protocol that defines how a lot of content on the internet moves around. It’s part of how this page got to you. But it’s a completely unencrypted format. When you’re browsing the web in HTTP, you’re sending everything in clear text – anyone that can listen in (for example, on an unencrypted WiFi connection) can read whatever you’re sending. But if we add the “s”, and browse via HTTPS, then everything we do is encrypted, and no one can listen in*.

But there’s some gotchas with HTTPS pages. For instance, most webpages actually consist of multiple requests – the Lane homepage has 34. If even one of those requests is made over HTTP instead of HTTPS, then we have a “mixed mode content error”, and the browser hides that content.

And that’s kept us from specifying our base_url so far. If we set it to “http://www.lanecc.edu”, then on pages that are HTTPS, like webforms, then all the styles and javascript will break, since those would be sent over HTTP. And if we went the other way, and set the base_url to “https://www.lanecc.edu”, then our caching infrastructure, which is built assuming most connections are over HTTP, would break, significantly slowing down the site. So we’ve been stuck running a mixed-mode site – most people use HTTP, but authenticated people and webform users use HTTPS.

There’s a number of reasons that isn’t ideal, which are well outside the scope of this already too long blog post. And the wider Internet is moving forward with using HTTPS only everywhere. So yesterday, we deployed new caching infrastructure which will allow us to go with using HTTPS only. Going forward, all connections with www.lanecc.edu will be encrypted.

This should be a almost completely transparent transition, but if you notice any problems, email us at webmaster@lanecc.edu and let us know!

* strictly speaking, this isn’t true, and there’s a a whole category of attacks that can still work on HTTPS. But there’s a fix for that too, and we’re working on rolling that out too some time in the future.

A Static Server

As I may have blogged once or twice previously, making websites really fast is important to me. Recently, we’ve made a change that should help us improve the speed of not only www.lanecc.edu, but of many of Lane’s other websites, including this one.

When you request a webpage, in addition to fetching the page itself, you usually end up requesting each individual image, font, css sheet, and javascript sheet. Making those requests is expensive. There’s a sliding window, which limits how many requests you can make at once. So if the window size is 5, but you need to make 6 requests, you need to wait for one of the first 5 requests to finish before you can make the 6th. On the other hand, if you only need to make 5 requests, your browser can start rendering the page a lot sooner.

One way is to combine images into a css sprite. Here’s ours:

That’s 15 different images combined into one. And even though there’s empty space on the bottom right, it’s actually usually faster to fetch a slightly bigger image then it is to fetch each icon individually.

Another way is CSS and JavaScript aggregation. Most pages on our website have 35 CSS sheets and 35 JavaScript files – but after aggregation, you only end up requesting 7 of each (there’s reasons why 7 of each is better then just 1 CSS and 1 JS, but that’s outside the scope of what we’re doing here).

But the easiest way to speed up a request is to not make it at all. When you make a request, you send some request headers with it, that tell us things like if you’ll accept a compressed version and what language you prefer. Then we respond with a header that tells you how long you should keep our response cached on your computer. If you need that resource again before the expires time we gave you, you just use the local one, rather than fetching a new one.

In addition to sending headers with your request, and us sending different ones, we also send each other cookies.

A picture of some cookies — baked by https://www.flickr.com/photos/plutor/3646688/in/photostream/

Cookies are always set on a domain. When you log into Moodle, cookies are set on the domain classes.lanecc.edu, so Moodle knows on every page load that you’re logged in. But here, on the Lane website, we’re not so lucky, as you can actually use the website on either www.lanecc.edu, or on just lanecc.edu. So we set our cookies on .lanecc.edu. That little dot before lanecc.edu is critical. That tells your browser to send your cookies to all domains that end in lanecc.edu.

The downside is that those cookies are sent on every request to our domain – even requests that don’t need them, like the request your browser made to show you the picture of those cookies up there.

What does this have to do with static?

We’ve started moving relatively static (unchanging) resources, like the college logo and the megamenu onto the static asset server, which we’re putting on the domain static.lanecc.net. Since these resources are relatively unchanging, we can set a long expires time – maybe 30 or even 45 days in the future (for now, it’s set to 0minutes in the future, for testing). Once your browser has fetched them, you won’t need to fetch them again for a whole month. And because they’re not under the lanecc.edu domain, you won’t send any cookies (and we won’t send any back), making the few requests you do need to make even smaller.

If you’re really curious about the inner workings of our new static asset server, I’ve added some extra geeky content to the Our Tech Stack post.

In the months to come, we’ll keep migrating content onto the static asset server, trying to reuse resources between websites, so that the logo you see on myLane is served from the same URL as the logo you see in Moodle, reducing the number of requests you need to make, and making it simpler for us to update things in the future.

IE8 Support Discontinued

Last month, for the first time ever, Internet Explorer 8 usage on the Lane website (www.lanecc.edu) slipped below 3% overall. Normally, we discontinue support for a browser when it dips below 5%. But IE8, as the most recent version of the default web browser supplied with Windows XP, got a special exception.

But, unfortunately, supporting Internet Explorer 8 requires a disproportionate amount of time, and since that usage continues to fall, we’re making the decision to no longer actively develop webpages for IE8.

We will not be removing any existing support, so the Lane website should continue to look mostly ok in IE8 for some time. But, eventually, it will start to break. For that reason, we recommend you use a more recent browser that does work on Windows XP, such as Firefox, Chrome, or Opera.

Of course, if you are stuck using Windows XP, we also recommend that you consider replacing or upgrading your PC as soon as possible. Microsoft will be discontinuing support for Windows XP this year, meaning it will soon no longer be receiving security updates.

Lastly, there are a couple of applications on campus where the recommended browser is IE8 – notably Banner. If that sounds like you, have no fear! You can continue to use IE8 to access Banner. But you should use one of the other browsers – like Firefox or Chrome – for all of your other web browsing needs.

Link text standards

Take a look at these two sentences:

You can read the post on the Web Team Blog.
You can read the post at https://blogs.lanecc.edu/webteam/2013/12/13/search-engines/

Which is better?

Answer? the first one. There’s a couple reasons.

First, it turns out that descriptive text in links like this is actually really helpful for search engines to determine what’s on that page. So providing descriptive links can really help improve search.

Second, if you’re linking to another Lane page from within Drupal, if you use descriptive text you don’t need to worry about ever updating the link. We’ll take care of it for you. If you use the URL as your link text, then you might get a weird situation where the link text no longer matches the url you’re linking to – it says “lanecc.edu/science/bio” but you’re actually sent to “lanecc.edu/science/biology”.

Third, it simply looks better. No one wants to read long, ugly looking urls as part of their text.

“But wait,” you say, “what about when someone prints the page?”

We’ve thought of that. When you print a page from the Lane website, we automatically include the url next to the linked text. Of course, we’d rather you didn’t print web pages in the first place, but that’s a discussion for a different day.

Of course, there’s exceptions to this rule. Use common sense.

We’ve added a check to our linkchecker, so from here on out we’ll be actively hunting for these links. We’ll fix some of them for you, but we may also contact you for some help rewording your content.

As always, if you have any questions about best practices with content, send Lori an email.

Search Engines

Back when we first started the website redesign, we received a lot of feedback about how our search engine – https://search.lanecc.edu – didn’t work very well. Now most of our questions are about altering the behavior of the search engine to make it work differently. Over the next few posts, I’d like to explore what changed, as well as why not every request can be responded to, as well as dig a little bit into what you, as a Drupal editor, can do to improve how the search engine views your pages.

How does search work at Lane?

We use a Google Mini search engine, which allows us to index and search up to 50,000 documents. We have complete control over what pages are in our search (the “index”), and limited control over what’s shown in the results. Also, it’s blue, which is a nice contrast to the beige and black of most of the machines in the data center.

The first big change we made to search as part of the redesign was to upgrade our Google Mini, which we did in early 2012, switching to a brand new search server with upgraded hardware and software. We found a pretty immediate improvement – no longer did it feel like using an old search engine from ’05 or ’06, and instead it felt like using one from ’10 or ’11. Unfortunately, Google has discontinued the Mini, and there will be no further upgrades. We’ll need to find a new solution in the future (Apache Lucerne?).

Along came a migration

Then we started migrating pages to Drupal. This brought with it a bunch of new practices that we’ll get into some other time, but all of which dramatically increased the relevance of search. The down side is that we changed virtually all of the URLs for pages on lanecc.edu (Yes Sir Tim, I know it was a bad idea). While we’ll hopefully never need to do this again, it meant there was some confusion in the results for a while.

The migration also meant that we cut a lot of pages. More than 10,000 of them. Enough pages that cutting them significantly changed how the mini calculated page rank. We’re still removing these from the search index. It’s a slow process, since we don’t want to delete more than one or two folders worth of files each day, so that if someone was still depending on a page or image that didn’t get migrated, it won’t be as hard to get that person their missing files.

Reset

Since the mini wasn’t removing pages that had long since disappeared, we decided to reset our search index. This is pretty much what it sounds like. We tell the mini to forget about all the pages it knows, and start over from the beginning. When we did this last, around Thanksgiving, our document count in the index went from about 40,000 all the way down to 16,000. We think results improved quite a bit.

We’ll reset again around Christmas, which is traditionally one of the slowest days on the website. Hopefully that’ll bring the document count down even more, and make results even better.

Biasing

At the same time as our last reset, I figured out that I’d been using Results Biasing incorrectly. Results Biasing is a way that we an introduce rules into the search engine to influence the results. Our first rule tells the mini to significantly decrease the pagerank of urls that end with xls or xlsx, under the assumption that when you’re searching Lane, you’re probably not interested in some Excel sheet.

I thought that by simply entering rules in the Results Biasing page on the mini administrative interface, I was affecting the search results. Turns out this isn’t actually true. There’s a second radio button to hit on the Frontend Filter, where you actually enable Result Biasing on the collection that frontend is searching. What are containers and frontends? A container is a collections of urls that match certain patterns. For example, we have a container of just pages that match pages related to our COPPS pages, and another just for the Library. Frontends are the user interface to those collections, where you can customize what the search button says, what the background color of the results is, etc. You can use any front end with any collection, but in our case each frontend belongs to just one collection.

Feedback

The other big thing we did to our search was to add a feedback form in the bottom right hand corner of the search results page. To date we’ve had 40 people let us know about their searches that didn’t get them the results they needed. Many of those have resulted in us making a tweak to the search engine, either adding pages to the index, or adding a KeyMatch, or fixing something on the page to improve its visibility.

Feedback has started to taper some, while queries have stayed steady (if you adjust for changes in enrollment), so our assumption is that search is working pretty well. If it isn’t, please submit some feedback!

Next time

Now that we’ve got some basics out of the way, next time we can dig into how search engines calculate results (If you’d like some homework, here’s a bit of light reading), as well as more of what we can do to influence those results.

Spelling, Internet Explorer, and Easier Linking

Currently, we use a plugin called LinkIt to handle linking between our pages (for a discussion of how we use it, see https://blogs.lanecc.edu/webteam/2013/02/05/common-pitfall-links/). LinkIt is pretty neat, and helpful. But it was always a pain for people to link to internal pages. We’d often get a scenario where someone wanted to link to some page, say, their department contact page. So they dutifully use LinkIt, and start searching for pages with titles like ‘contact’. But since every department has a contact page, they’d get fifty results and spend forever hunting for the right page.

Not helpful.

In fact, it was so unhelpful that people were directly pasting links to their pages into the page, meaning that when page titles changed, things started to 404, and things just got nasty. So I spent some time investigating why LinkIt couldn’t just figure out if a given URL is internal, and if so look up the node number and properly insert the link like a good little plugin.

Turns out the developers had already thought of this. Right there on line 388 there was a little todo that explained our problem. Like any responsible web site, we’re forcing page edits to happen on HTTPS. But for non-administrative users, we’re serving up most pages via HTTP (so we can take advantage of the speed our Varnish server provides us). So when an editor copied a link into LinkIt, the server would check the link (for example, ‘http://www.lanecc.edu/contact’) and see if the start of that link matched the scheme and server name of the current page. Since we’re forcing https, that’d end up being something like https://www.lanecc.edu/node/1337/edit. The server name matches, but the scheme doesn’t, and LinkIt would report that this link was external.

One simple change later, and now when you paste in a link it decides not to check the scheme. So from now on, if you’re linking to an internal page, just paste your url into the top box, and you’ll see something like this:

When that little green popup appears, simply click on it and LinkIt will automatically identify your node number and insert it as the target path. Click Insert Link, like normal, and you’re good to go.

The second change is that we got spell check working on Internet Explorer 9. For others with a similar problem, our issue was on Drupal 7, using TinyMCE 3.something, with the wysiwyg_spellcheck module and wysiwyg modules enabled. Turns out that when you enable the module, you also need to make some changes to wysiwyg/editors/tinymce.inc. In the function wysiwyg_tinymce_settings, you need to add add another key to the settings array:
'spellchecker_rpc_url' => '/libraries/tinymce/jscripts/tiny_mce/plugins/spellchecker/rpc.php',

You should, of course, adjust that path to whatever is appropriate for your site. One final note, the problem described with LinkIt is actually fixed in the new 7.x-3.x version. But we’re using the older, 7.x-2.x version, which is no longer in active development (and which has no migration path to 7.x-3.x)

Google Mini Issues

We’ve mentioned a few times in meetings around Campus that we’re tracking search volume on our Google Mini. Our plan has been to use the last year’s data to see how the new website impacted visitor’s abilities to find different things. In other words, if for the last year everyone searched for “Schedule”, and now suddenly no one searches for “Schedule”, we can assume that we succeeded in making Schedule more visible. On the other hand, if no one used to search for “Moodle”, and now suddenly everyone is searching for “Moodle”, we know it’s hard to find Moodle.

We also tried to make the MegaMenu as small and as lightweight as possible, to make it load as quickly as possible. Right now the entire Megamenu uses about 30k – less than your average Internet Cat Picture.

Picture of a cat in a hollowed out computer monitor, with the caption 'I'm in your internet, clogging your tubes" (spelled wrong) — For example, this picture uses 31k.

But unfortunately, when we put these two things together – a super optimized mega menu and a desire to track search engine statistics – something broke.

The Google Mini has different collections. For example, we have a collection that searches just our COPPS documents. The biggest collection is the default collection, which searches all the pages in the index (all collections, plus pages not in any other collection). When you search the Mini, a bunch of parameters are sent in with your query, telling the Mini which collection to use, which set of styles to apply, etc. If you omit all of these parameters, search still works – the default collection is searched and you’re shown your results in the default front end. I assume that also meant that the searches were logged as belonging to the default collection.

Unfortunately, it turns out this is not the case. Here’s a streamgraph of our search traffic:

Search volume to Lane, showing varying ups and downs throughout the year, then a sudden dropoff for the last two weeks — 428/12 – 4/13/13

See that real small area on the right hand side, where traffic falls off a clif? For the last two weeks searches were correctly logged in the search engine, but are not given to the default collection for inclusion in search reports – effectively making them invisible to us.

We added the hidden parameters to the search bar in the MegaMenu, and verified today that we’re seeing an 800% increase in search traffic in reports over what we saw yesterday. Apparently those parameters are not as optional as I thought.

In a way, this error isn’t a big deal. Right after launch, we expected a surge in search traffic as people searched for things before getting comfortable with the new layout. So we’d always anticipated throwing out the last two weeks of data in our analysis. But as a bit of a data nerd, I’m always sad when any data is lost, and we’ve had a hard time interpreting how people are interacting with the new site.

I’d also like to point out that we added the MegaMenu to the search results page. If you search for something via the search engine and discover that you’re not getting the results you want, you can also try AskLane from the search box on the MegaMenu. Of course, we’d also encourage you to submit some search feedback to us (the form moved to the bottom right), so we can try to improve the search engine to give you better results.

Two final notes. If you ever find yourself managing a Google Mini, and pulling your hair out because the new XSLT you applied isn’t rendering your new styles, it’s because you need to add &proxyreload=1 to a search string, to clear the style cache. Second, this weekend we’ll swapping www and www2 (don’t worry – all links will continue to work regardless which one you use), and as part of that we’ll be resetting the search engine index. That should clear out a lot of old pages that aren’t really relevant any more, and are no longer linked, but are still out there on our old web server. It’ll take most the weekend to rebuild the index, but hopefully next week you’ll get even better search results.

Caching Changes

Once again, we’re on a quest to make the website as fast as possible. Drupal doesn’t make this easy – on an average page load, our request for that page passes through three servers, sometimes generating hundreds of database calls and touching dozens of files. But through caching, we’re able to skip most of those steps and serve you content that was already waiting for you.

Now that we’re a couple of weeks post launch, we’re going to start making our caching a little more aggressive. If you’re a web editor, this might impact you.

Previously, the cache time on all of the objects (pictures, fonts, stylesheets, etc) on our pages was set to six hours. This makes sense the first few days after a big launch – we were anticipating lots of changes. But now that things have stabilized some, we’re turning up the cache lifetime on our stylesheets and javascript files to one year, and turning the cache lifetime of images up to one month. For 99% of visitors to the Lane website, this will result in up to 17 fewer requests to our servers – a 70% savings – creating a significantly faster web experience.

Unfortunately, this also means that there’s certain circumstances when a picture you’ve added to your website remains in the cache, and you’re unable to force someone to use a new one. Let’s say that you have a picture, called “kitten.jpg”, of a kitten eating a hamburger. You save your page, and people flock to see your awesome image. Every one of their browsers sees that the picture has an expires header of 111600 seconds – 31 days. Next time they visit your page, their browser won’t attempt to download that picture again unless at least 111600 seconds has passed.

The bad situation happens when you want to replace your kitten picture with a better one (say, a picture of your kitten eating a cheeseburger). You upload your new file as “kitten.jpg”. But your visitors will never see the new picture – their browsers see the same filename, and they know that particular file hasn’t expired from their cache yet.

So how do you fix it? Well, for one, remember that the picture will automatically fix after 31 days. But if you need an immediate fix, there’s a simple trick. Just rename your picture as “kitten2.jpg” and add it to your page like a new image. Now when the browser visits the page it’ll realize that it doesn’t need kitten.jpg anymore, and grab kitten2.jpg instead. We’re setting the cache at 1 year for JavaScript and stylesheets because we’re able to depend on Drupal to create new stylesheet and JavaScript names automatically when they change, and effectively do the same thing as we did with kitten.jpg in the example.

Things are actually a bit more involved than what’s above, but it’s not worth the extra confusion. If you’re interested in how we’re actually implementing caching, please send me an email or leave a comment.

I know this extra renaming step might seem painful, but we can’t ignore page speed. Google is incorporating speed into their rankings, and for users on slow connections (3g, in particular), we want to make sure the page renders in less than 1 second. Caching will take us a long way towards where we want to be.

Faster Deployment Testing

Over the last year, we’ve occasionally been posting about page deployments from the old Lane site to the new Lane Drupal site. Today, we dramatically improved our testing infrastructure, and took a step towards a better, more accessible website.

Before we can explain that step, you need at least a grossly simplified idea of what it looks like for us to move pages. Essentially all we’re doing is manually copy and pasting over content from the old site to pages on the new site, then properly formatting and linking everything. Easy with one or two pages, but when doing a chunk of a hundred pages, it’s a tedious and time consuming process. And, as with any tedious process, it’s prone to errors.

So how to we ensure that everything worked right after a deployment? Our principle tool is a Link Checker, which is a type of Web Spider. A spider, once given a page to start on, follows all the links on that page, and then all the links on those pages, and so on. Eventually, it establishes a kind of web of a website.

When we first tried to find a spider, we found several that almost worked. The one we got furthest with was, appropriately enough, called LinkChecker. But it wasn’t quite right for our needs, and we had a lot of difficulty trying to extend it. So we did what any self respecting, overly confident programmer would do, and wrote our own.

Our first pass worked pretty well – checking some 13,000 pages and all their hundreds of thousands of links in about half an hour. But like any tool, before long, you want more power!

We experimented with adding some rudimentary spell checking, but found that doing a complete check of our site could take as long as 4 hours! And there were so many other things to add.

Over the last couple days, we’ve done a complete rewrite and realized significant speed gains. Instead of 30 minutes to check the site, it now takes 7. And adding in spell checking only makes it take 12. We also added a few other features:

Hotlinking Checks:
It’s possible to include an image on your page that’s technically stored on another server. This practice is called “hotlinking” and is generally discouraged (some things, like Facebook icons or Google Docs embedded images are ok), since it can lead to awkward situations where the hosting server either removes or changes the image – effectively controlling content that’s displayed on your site. We’re now checking to make sure that all the images you include are local (or are part of a list of allowable sites)
Alt Tag Checking:
According to accessibility rules, images are supposed to have alt tags to help visually impaired people identify what’s in the image. They’re really easy to add, so we really have no excuse not to include them every time we put an image on one of our pages. We’re now logging images that are missing alt tags.
External Link Checking:
Previously, we were only checking links within the Lane website. We’ve now expanded that to check links off site.
Page Title Checking:
It’s still in experimental mode, but we’re adding page title checking to make sure that none of our pages have redundant titles for SEO reasons.
Phone number formatting:
We added a few checks to make sure that all phone numbers are formatted appropriately, so that you can click them to make a phone call.
Email address formatting:
Similarly, we’re now making sure that all mailto: links have a properly formatted email address after them.

The broader, more important thing we did was to make sure the framework for our link checker is easier to extend – simplifying new tests in the future. And, because its so much faster, it’s “cheap” to run a test, meaning we can do them more often to catch problem areas sooner.

Now, on to the broader problem of actually fixing all the new problems on the site we’ve uncovered….