see a puffin eat a fish

Archive for the ‘Code’ Category

Duplicate Records

without comments

something is missing

Here’s a very cool thing I just discovered in MySQL. Duplicate records in a db are the bane of the software developer. If you have the opportunity to generate the original table schema and set up the primary keys beforehand, that’s great, but if you’re given a table after the fact and are asked to deal with duplicate records, it’s a real headache. You end up shuffling records around, trying to create new keys and if you’re really lucky, the table is huge with six indexes and any modifications you try to make hang everything. If you’re extra lucky, you’re dealing with realtime production data that’s constantly being written to as you’re trying to clean up the data. Check this out:

ALTER IGNORE TABLE mytable ADD UNIQUE (key1,key2)

This doesn’t add or remove any records, it just tells MySQL to ignore records with a duplicate composite key whenever you do selects. Pretty cool. This probably isn’t a long term solution but gets you out of an immediate crisis. You can then erase the problematic records on your own time.

Written by mb

March 28th, 2008 at 9:40 am

Posted in Code

Announcing: What’s Your 20?

with 6 comments

Introduction:

What’s Your 20? (WY20) leverages Fire Eagle and your mobile device to help you keep track of your location. By recording what are usually considered less interesting trips (a subway ride into the city as opposed to a flight to Florence), WY20 facilitates the mapping of your everyday travels. WY20 both lets you update your location via text message anytime you like and also periodically sends you a text reminder to update your location (just in case you don’t think your present location is that interesting and didn’t tell WY20 about it). Once you update your location via a text message, it becomes available to any other third party applications that use the Fire Eagle platform. Aggregate location data can be visualized to show one’s personal history.

Marquee

Why did you write WY20?

I wanted to build a low tech way for people to keep track of, and remember that their daily location is important. We all have photos from our vacation but we do not document the more mundane trips. Those take up the bulk of our time and in reflection, may be the ones that are most interesting.

Fire Eagle is still in beta and the data from WY20 is currently only viewable on the Fire Eagle website. As Fire Eagle data gathered by WY20 is used by third party developers, data passed by WY20 to Fire Eagle will be used to show your current location as well as your travels over time in places such as Facebook, your blog and on your mobile device.

How do I use WY20?

It s easy! Just follow the user my20 on Twitter. Then just send my20 a twit (using the web or your phone) introducing yourself like so:

@my20 hey!

In a little while (Twitter can get overloaded), WY20 will respond with a Fire Eagle authentication URL. Click on this link and authenticate Fire Eagle against WY20. This step is necessary so that WY20 is allowed to update your location information.

ZoneTag Photo Monday 12:22 pm 2/18/08 Cambridge, Massachusetts

What’s next?

That’s it, you re done! Anytime you want to update your location, just send a direct message to WY20 using Twitter’s shortcode (40404) and prefacing each message with d my20 (Twitter’s shorthand for sending a direct message to the user my20). For example:

d my20 Brooklyn, NY

Be as specific as you want about your present location. Texting ‘USA’ will work but so will ‘123 Main St Anytown,NY’ (if Anytown,NY was a real place). As long as Fire Eagle can resolve the address, it’ll be stored. What’s more, a few times a day, WY20 will remind you to update Fire Eagle with your present location.

If you don’t want to be bothered by WY20 at night, you can configure that within Twitter.

twitter config

Is putting my location information on Twitter secure? I don’t want the whole world to know my location.

Yes. Your location is kept secure because all location self reporting is sent via direct message to WY20. Your location will NOT appear on your public timeline. It is stored within Fire Eagle where you have full control over which applications can and cannot see your location information.

What else?

More commands are coming but you can also tell WY20 to leave you alone
by texting stop. Like this:

d my20 stop

You’ll still be able to report your location but you won’t receive any
more text messages. To have WY20 send you messages again, just send a
start command.

d my20 start

Enjoy!

Clarification

What’s your 20 is something I learned from watching the wire.

What does “What’s your 20?” mean?

To answer that question you need to understand ten-codes. Ten-codes, or ten signals, are code words used to represent common phrases in voice communication, particularly in CB radio transmissions.

Ten-codes were developed in the 1940s at a time when police radio channels were limited in order to reduce use of speech on the radio. Historically, the codes have been widely used by law enforcement officers in North America.

Ten-codes were later adapted for use by CB radio enthusiasts before its pop culture explosion in the late 1970s. Remember the song Convoy by C.W. McCall in the mid-70s? How about the movie Smokey and the Bandit? Because of these pop-culture gems, phrases such as 10-4 for “understood” and what’s your twenty? for “where are you?” were forever embedded into the American lexicon.

“What’s your 20″ is a phrase meaning the same as the ten-code, 10-20. It is the same as asking “what’s your location” or “where are you?”.

— from http://whatsyour20.com/help/

Written by mb

March 23rd, 2008 at 12:03 pm

Posted in Code, Fun, WayMarkr, wy20

The hottest thing I found this week (and it’s only monday)

without comments

sad groucho marx bear

Lately i’ve been doing data imports with rails where I am running into a number of concurrency, race and timeout conditions. I have too many potential points of failure (network latency, database timeouts, etc.) that I cannot fully control but must account for. Worst case scenario I end up with missing or duplicate records when an import job dies or overlaps with another one.

The Rails way to address the data duplication problem is something like this in the controller:

validates_uniqueness_of :guid

This is a promise from Rails that it will not attempt to insert a record with the same guid as an existing record. What that really means is that if you are inserting 1,000 records, Rails will generate 1,000 selects to verify non-duplicate guids. That doesn’t really scale. I got around this by commenting out the model’s validation and creating a unique index straight up on the database.

add_index(:my_table, [:guid], :unique => true, :name => 'guids_must_be_unique')

Then instead of doing a select/insert cycle for each record to test for duplication (as the model would have done) I just do an insert and wrap it in a begin/rescue clause. If the insert fails because the unique index is violated, mysql throws an exception but execution continues:

Mysql::Error
or: Duplicate entry '17-24484' for key 2: INSERT INTO my_table ('guid') VALUES (24484)

ZoneTag Photo Saturday 4:07 pm 2/9/08

This reduces two calls to one (from 2000 selects/inserts to 1000 inserts) but doesn’t change the fact that 1000 inserts take a really long time. My scaling issues haven’t gone away. Enter the hottest thing I found this week, ar-extensions. It’s a bunch of really useful extensions to ActiveRecord which I have yet to fully explore but the one that has saved the day for me is the bulk import functionality. Ar-extensions lets me build up a giant array of arrays which it then proceeds to insert in bulk, as many as the underlying database will allow. My system went from 1,000 (or even 10,000 SQL calls) down to one. Routines that would have run all day finish in seconds. Let me show you have easy it is to create a preferences record for all your users (special thanks to this post for the helpful examples):

require 'ar_extensions'
# columns you want to import
@columns = [:guid, :contact_me_via_email]
# assuming a User model
@values = User.find(:all).map { |u| u.guid, u.contact_me_via_email }
# then just run an import
Preferences.import @columns, @values

Run the code, tail your log and watch the magic. What’s even more useful is the poorly documented :o n_duplicate_key_update option which translates into a bulk update (as opposed to insert).

Preferences.import @columns, @values, :o n_duplicate_key_update=>[:contact_me_via_email]

if an existing guid is found (assuming a unique index exists), a new record will not be inserted. Instead, the contact_me_via_email field will be updated.

Written by mb

February 11th, 2008 at 6:06 pm

Posted in Code

Praise for Ubuntu

without comments

I’ve been trying to get Linux working on i386 since Sunday. Since Sunday! It’s now Thursday. My main problem is that my ditstro wouldn’t recognize my cheap-o Buffallo wireless card that I got for eight dollars at some mail order house. I started with Gentoo, but that was hopeless, so I moved onto Fedora because a friend told me they support everything.

Here is what I have to say about my Fedora expeirence: visualize my tumble into an endless spiral of kernel rebuilds (there is no Linux driver for my wireless card and the default windows driver requires a larger stack), ndiswrapper (which doesn’t support vista drivers, the only ones I have. The ndiswrapper site provides links to other drivers but good luck matching up your exact card, chipset and revision) and fwcutter version hell (you’ll need even more luck to confirm your eight dollar card’s firmware). I’m sure this stuff is obvious if you do it on a daily basis but since I don’t frequently build machines and install operating systems, this was a complete headache.

sna with his mom

Last night Matt comes over and after messing around for close to four hours with Fedora, he said, “why don’t you try ubuntu“. I install Ubuntu, it immediately discovers my wireless card, lets me know the driver is closed source but it can install it anyway and proceeds to deal with all the ndiswrapper/fwcutter rigmarole for me. It even figured out I could use a closed source NVIDIA driver. Not to mention it’s beautiful and _fast_. The Vista partition takes ten minutes to load, Ubuntu comes up in 20 seconds. I heart Ubuntu.

Written by mb

January 24th, 2008 at 11:14 am

Posted in Code, Notes

Who is brave enough to try SimpleDB?

without comments

I’ve been doing a lot of work with Amazon’s S3, pay as you go, simple storage server. I know that S3 has had problems but IMHO the trade offs are absolutely worth it for storing large files remotely. If reliability is a concern, it’s pretty straightforward to write a local caching mechanism for large media files. I need to do this anyway for redundancy as well as to save money on frequently accessed files. Plus if you’re lazy you can always try Squid or some other hardware caching solution.

I’ve seen some mysql plugins that talk to S3 but Amazon has done one better and released a SimpleDB service which allows you to remotely store all of your persistent, relational data. My question is, who is brave enough to try SimpleDB? It’s one thing to remotely store flat files while it’s a much more committed undertaking to offload your whole database to a third party. You are fully tethered. Any downtime SimpleDB has will kill your site and caching SQL calls is a little more complicated than caching flat files. Not to mention the potential performance hit of doing all of your SQL calls over REST or some other XML based transport.

I’m quite eager to see the early adopters of SimpleDB and the specifics of their implementation. If this works, is fast and reliable, one of the biggest headaches of site management will go the way. Looks like the Rails community is already thinking about integration.

Written by mb

December 18th, 2007 at 1:35 pm

Posted in Code