Sorting text

An article, posted 13 days ago filed in sorting, ruby, programming, database, order, databases & sql.
Sorting text

There are a few hard problems in computing. Correctly handling time, naming, preventing off by one errors… sorting text may not be one of them but recently we ran into a discussion where I couldn't make up my mind anymore. Hence, this post's topic: sorting text.

The problem

How do you sort the following words:

  • cheese
  • Ape
  • Drums
  • dent
  • Beer

If you'd ask ruby I'd get:

 %w[cheese Ape Drums dent Beer].sort

Results in:

  1. Ape
  2. Beer
  3. Drums
  4. cheese
  5. dent

Which in my useless and ramshackle programmer's brain translates to, well why not, it is sorted right?

But then we moved the data into a database which was correctly set up with a proper locale for 'collation', a term that I've seen but never meant anything to me until this problem. Collation is:

> the assembly of written information into a standard order.

(thanks Wikipedia - Collation)


Continue reading...

Blog concept: Sketchy optimisations

An article, posted 3 months ago filed in activerecord, database, optimization, orm, performance, query, rails, software & sql.

Recently a colleague was showing me a concept he was working on. He drafted a change in a fight against so-called 1+n-queries (actually for some reason unknown to me they're called n+1 queries, but my head isn't able to process the problem with just one more query after n queries…); in software development using ORMs like active record it is quite easy to make a single database request objects that when a presented within a view trigger other queries for every object because it has a relationship. Round trips to databases are generally bad as they take time.

For his change, he introduced a new class that we could seemingly reuse, with a just another (a bad code-smell) declaration of relations between objects and whether these should be preloaded when retrieving the primary object. This was in response to indeed a quite bad part of our code that entailed returning objects with counts of selected associations, but instead of counting these in the database, the current code was a…

Continue reading...

It's digital stupid

Notes by Luke Wroblewski on the Martin Belam (Guardian) talk at EuroIA:

> Up front, the team did not get their API model right. They tried to use ISBNs for books and did not heed advice that ISBNs are “evil”.

Sounds quite familiar :)

> They (ISBN numbers, ed.) are a physical system not a digital system. They don’t identify a unique work but a specific edition. They don’t cover anthologies, they are added to CDs, calendars and even card displays.

Lately I've been wanting to slam my head quite a couple times for a similar reason: not choosing the right identifier. While much of the data I work with lately has multiple codes/numbers that look like unique identifiers usable in the digital environment I am building. None of them, however, fitted my desired digital world view. While I could have adopted the real world view underlying the existing identifiers, that view did not fit the …

Continue reading...

murb blog