Deep Learning & Unintended Algorithm Bias

This was a 5 minute talk on deep learning for the very excellent @chesterdevs. Like others talking about deep learning, I took visuals and the face-learning example from the landmark 2012 paper, Quoc Le/Google/Andrew Ng paper, “Building High-level Features Using Large Scale Unsupervised Learning.”

Only afterwards did I notice that the subset of images which their system show as “most like a face” from their test set were 90% male and 90% white, as is the prototypical face that the machine outputs.

And so we have a neat demonstration of unintended algorithm bias: their input was 10 million randomly-chosen youtube videos; the output was white and male. I bet they didn't expect that.

A salutary reminder that—as the hard-working statistician will tell you—“random selection” does not mean “unbiased”.

AutoHotKey script for they who, being Mac Users and also equipped with an Apple keyboard, yet still they work at a Windows desktop

Surprising how much time you can spend on these little niggles…

  • Irritated that Windows doesn't have an ellipsis key?
  • Wondering how to do printscreen from your apple keyboard?
  • Really really fed up with swapping between Cut‘n’Paste is “⌘-C,⌘-V” and Cut‘n’Paste is “Ctrl-c-Ctrl-v”?

Help is at hand.

AutoHotKey

When I first came across I was a bit unsure about using AutoHotKey. But I have seen the light. It is the bee's knees. It is open source, widely used for years, free, small footprint and is the ultimate customise-all-the-things tool for Windows. It is a combined scripting tool & Keyboard/Mouse hotkey manager.

My AutoHotKey for a Mac User with an Apple keyboard on Windows Script

https://gist.github.com/chrisfcarroll/dddf32fea1f29e75f564

There you are. It's all you need. That, and a few hours to customise it yourself. Then a few more hours to… oh, never mind.

Disclaimer

When I said that AHK is the bee's knees, I didn't say that the language isn't arcane, unintuitive and bearing signs of organic growth over a decade or more…

Conway’s Law & Distributed Working. Some Comments & Experience

The eye-opener in my personal experience of Conway's law was this:

A company with an IT department on the 1st floor, and a marketing department on the 2nd floor, where the web servers were managed by the marketing department (really), and the back end by the IT department.

I was a developer in the marketing department. I could discuss and change web tier code in minutes. To get a change made to the back end would take me days of negotiation, explanation and release co-ordination.

Guess where I put most of my code?

Inevitably the architecture of the system became Webtier vs Backend. And inevitably, I put code on the webserver which, had we been organised differently, I would have put in a different place.

This is Conway's law: That the communication structure – the low cost of working within my department vs the much higher cost of working across a department boundary – constrained my arrangement of code, and hence the structure of the system. The team "just downstairs" was just too far.  What was that gap made of? Even that small physical gap raised the cost of communication; but also the gaps & differences in priorities, release schedules, code ownership, and—perhaps most of all—personal acquaintance; I just didn't know the people, or know who to ask.

Conway's Law vs Distributed Working

Mark Seemann has recently argued that successful, globally distributed, OSS projects demonstrate that co-location isn't all it's claimed to be. Which set me thinking about communication in OSS projects.

In my example above, I had no ownership (for instance, no commit rights) to back end code and I didn't know, and hence didn't communicate with, the people who did. The tools of OSS—a shared visible repository, the ability to 'see' who is working on what, public visibility of discussion threads, being able to get in touch, to to raise pull requests—all serve to reduce the cost of communication.

In other words, the technology helps to re-create, at a distance, the benefits enjoyed by co-located workers.

When thinking of communication & co-location, I naturally think of talking. But @ploeh's comments have prodded me into thinking that code ownership is just as big a deal as talking. It's just something that we take for granted in a co-located team. I mean, if your co-located team didn't have access to each other's code, what would be the point of co-locating?

Another big deal with co-location is "tacit" knowledge, facilitated by, as Alistair Cockburn put it, osmotic communication. When two of my colleagues discuss something, I can overhear it and be aware of what's going on without having to be explicitly invited. What's more, I can quickly filter out what isn't relevant to me, or I can spontaneously join conversations & decisions that do concern me. Without even trying, everyone is involved when they need to be in a way that someone working in a separate room–even one that's right next door–can't achieve.

But a distributed project can achieve this too. By forcing most communication through shared public channels—mailing lists, chatrooms, pull request conversations—a distributed team can achieve better osmotic communication than a team which has two adjacent rooms in a building.

The cost, I guess, is that typing & reading is more expensive (in time) than talking & listening. Then again, the time-cost of talking can be quite high too (though not nearly as a high as the cost of failing to communicate).

I still suspect that twenty people in a room can work faster than twenty people across the globe. But the communication pathways of a distributed team can be less constrained than those same people in one building but separated even by a flimsy partition wall.

References

The Panama Pepers: a longtail SEO example

A Flemish friend commented recently on facebook that after all the news items about the 'Panama pepers' there were still no hits for it in search engines. (For those of you not previously acquainted with the Flemish talent for multi-lingual puns, I should explain that 'peper' in Dutch as pronounced as the English 'paper' but means pepper. He went on to mention a secret recipe for Norwegian salmon with Panamanian peppers).

And it was true! There were no hits on all the interwebs for Panama pepers.

Which brings us to the subject of longtail SEO. By writing pages on specific, not-widely-popular terms, websites attract to their site the small number of visitors who are interested in the topics to which they think searching those terms will lead them. Except: 'small', when your potential audience is The World may mean hundreds of thousands of visitors. Not so small after all.

Although Pepers is not an english word, by writing an article on the subject, and especially by discussing the Panama papers & Panama pepers (not to mention associated recipes), I thereby expect to rise to search engine page 1, if not hit #1, for the term.
We'll see how it goes...

Update after 4 months

www.google.be/#q=panama+pepers now produces thousands of hits. This page made it to the first page, but only just. Clearly I need still lack a recipe or two for Panamanian peppers.

De Panama Pepers: een voorbeeld van SEO longtail.

Een Vlaamse vriend merkte onlangs op facebook dat na al het nieuws over de 'Panama pepers' waren er nog geen treffers voor die termijn in de zoekmachines. (Voor degenen onder u die niet eerder kennisgemaakt hebben met de Vlaamse talent voor multi-lingual woordspelingen, moet je weten dat 'peper' wordt in het Nederlands net zo uitgesproken als het Engels 'papers'. Hij merkte verder een geheim recept voor Noorse zalm met Panamese pepers).

En het was ook zo! Er waren geen hits op de interwebs voor 'Panama pepers'

Wat brengt ons op ons onderwerp: longtail SEO. Door het schrijven van pagina's over specifieke, niet-alom-populaire termen, trekken websites het kleine aantal bezoekers aan die interesse hebben voor de betrokkene onderwerpen. Alhoewel: 'klein', wanneer uw potentiële publiek de wereld is, kan honderdduizenden bezoekers betekenen. Niet zo klein, dus.

Hoewel 'Pepers' geen Engels is, wordt door het schrijven van een artikel over het onderwerp, en in het bijzonder door het bespreken van de Panama papieren & Panama pepers (en de bijbehorende recepten), verwacht ik uiteindelijk eerste plaats te nemen in zoekmachine rankings.

We zien wel hoe het gaat ...