The maze book for programmers!
mazesforprogrammers.com

Algorithms, circle mazes, hex grids, masking, weaving, braiding, 3D and 4D grids, spheres, and more!

DRM-Free Ebook

The Buckblog

assorted ramblings by Jamis Buck

Net::SSH and Thread-safety

17 March 2008 — 1-minute read

Net::SSH 1.x is thread-safe. Net::SSH v2, as currently implemented in its pre-release state, is not.

I’ve debated this long and hard with myself. Wrapping code in mutexes and doing all the other stuff that thread-safety requires adds a surprising amount of overhead, and it feels painful to me to have to add all that when the most common use-case of the library will be in single-threaded environments. However, without the mutexes and all that overhead, using Net::SSH in a multi-threaded environment (where multiple threads are hitting the same Net::SSH connection) will result in some rather ugly errors, and would require those programs to add their own mutexes and such to protect the integrity of the connection.

I’m leaning towards leaving the library optimized for the single-thread experience, and requiring the people wanting multiple threads of execution to fend for themselves. However, in the interest of getting feedback from people who might actually use the library, I ask you: which would you prefer? A faster library in single-threaded programs? Or a simple program in multi-threaded ones? Is there a general best-practice in this case?

Reader Comments

I believe any libraries targeted at a wide audience, like Net:SSH, should be thread-safe.

Documentation and sample code can go a long way.

Is it feasible to provide a sample thread-safe client app to help with the fancy case, and a simplified wrapper API that contains a giant lock to help with the basic case?

At the very least, the documentation should state that it’s not thread safe if that’s the case, with a sample of how to use it safely in the typical case.

One option is to make it threadable (as apposed to thread-safe), that is if a person wants to use NET:SSH in a thread it can, but they still need to marshall all data in and out of that thread. This way if someone wants to use NET:SSH in a threading environment they don’t have to have it in the Main Thread.

I prefer non multithreaded libraries.

I believe any libraries targeted at a wide audience, like Net:SSH, should be thread-safe.

I hope that’s not the case. Clients should decide that the overhead of synchronization is needed or not.

Please make it thread-safe. Some platforms, like JRuby on the Java VM, can optimize away synchronization, and the cost is much less. Also, since JRuby uses native threads, you’d be getting more bang for your buck (sorry for the pun Jamis :).

Do you have any evidence that synchronization overhead is anywhere near significant when compared against connection establishment, encryption and transport time?

As long as there is no unsynchronized global state (the easiest way is not to have global state), I see no problem prohibiting threads from sharing Net::SSH connections without providing their own synchronization. In fact, I think it’s preferable.

Sometimes “thread-safety” can breed a false sense of security: a user might not be able to “corrupt” a threadsafe object by throwing multiple threads at it, but those threads can still sure confuse each other pretty well. I think this certainly applies to Net::SSH objects.

thread-safe please!

I don’t think it needs to be thread-safe at all. As teki said, it’s up to clients to decide if the overhead is worth it.

I’d go for speed, and document any perceived shortcomings. I have a feeling that the (over) use of threads is going to be seen as a premature optimisation in years to come and that proper process forking is gong to come back into vogue as the machines get more powerful and the memory sharing capabilities of the OS get better.

You Ain’t Gonna Need It.

A clean library is always nice :) If you don’t need the need at this point, I believe that is a good sign that it is not necessary.

To those who are scared of client-side synchronisation, I would say controlling the concurrency is better in your hands.

I agree with Neil. Docs for those cases which need thought.

I would follow MenTaLguY’s advice. Using threads in Ruby is often not a good idea. They don’t have the benefits of shared processing like pthreads but have the complexity on locking. This leaves them as an organisational/conceptual role. People who want to use them should be knowledgable and won’t have a problem wrapping your lib with some locks. Making it thread safe would equal to let newbies shoot themselves in the foot :-p

+1 MenTaLguY

With C++ I’ll sometimes turn classes into monitors, with the mutex a template parameter. If people don’t need synchronization they can supply a mutex whose every operation is a NOP. For a library such as this something similar may make sense.

Of course, there really is no harm in leaving this all up to the client.

I think the 80/20 rule answers this question. Since 80% of people will want to use this in single threaded environments, stop there. I might be wrong, but I’d guess that the majority of net-ssh dependencies out there come from people running capistrano. :)

I suppose another question is, what’s the canonical use case for multiple threads accessing the same ssh connection? And as a simple workaround in those scenarios, why can’t each thread just establish their own connection?

I don’t see any need for a utility library to concern itself with more than its problem domain. As long as there is no global state it would actually be preferable to avoid any (costly or not) threading related code.

The application which uses the library has to be context sensitive and if it the context requires threading than its the responsibility of the application to deal with this. The application cannot rely on libraries deep in the stack to clean up the mistakes which happen on the top.

I have to admit that while I have been using Ruby for quite a long time, I rarely use threading. Consequently, I too tend to agree with MenTaLguY and Ryan. However, a library like Net::SSH is one that could benefit from multi-treading. So what about making the core of the library thread-friendly (by avoiding global state), but not necessarily thread-safe. Then create a thread-safe connection adapter for the core library that would make is “simple” for threaded usage.

I think here you want the term ‘reentrant’. Reentrant means that threads may call functions simultaneously. This definition implies, on different input data. Or, since this is an OO language it means different objects.

To be reentrant you need only not use global data.

‘Thread safety’ indicates, at the strictest level that a function may be called on the same data safely. So this means the same object may be shared by multiple threads.

If you make your library reentrant then you make it no more or less thread safe than the objects you depend on. For instance, if you access files with non-thread safe operations, then you are not thread safe with regards to those files.

So, my suggestion is, you make your code reentrant. This most likely means you have an object that depends on one or more non thread safe objects such as a network connection or a file descriptor. This just means someone using threads must have his own mutex around your functions.

This is the same as if they were using the connection or file descriptor themselves—they would have to put a mutex around it. So, you are not giving threaded apps any more work than they would have to do.

Additionally, you can always provide that thread safety as an optional abstraction layer, which simply puts a mutex around every outside function call.

So, no don’t make it thread safe.

(Argument slightly simplified for sake of sanity).

-Adam

I agree with Mentalguy and Adam Luter. Make it reentrant, not thread-safe. It gives you most of the benefits and none of the performance drawbacks. Asking clients to do their own synchronization is not too much to ask.

@Adam Luter, I think I really do mean thread-safety and not reentrancy. (See Thread-Safe on Wikipedia for the definition I’m working under).

Thanks, all, for your input. We’ll see if I can get away with requiring the client to do the synchronization of critical sections. Capistrano uses Net::SSH, and threads, so if I can make it work with a non-thread-safe Net::SSH, chances are I’ll have done it right.

If the client needs to access the same SSH connection from more than one thread it should be up to the client the synchronize access to the necessary objects.

Good programming practices such as avoiding global variables should allow the library to be used in mutli-threaded applications. As with any shared data, if the client wants to access the same SSH objects in more than one thread, mutexes must be used in the client code.

This might be asking for too much, but would it be possible for you to have two versions of the library… using well thought out inheritance and probably a bit of metaprogramming, that if the code saw that it was being called by two different thread IDs it dispatched intelligently to a slower synchronized version whereas if it saw the same requester over and over again it went through the fast route?

@Jamis: Sorry, I didn’t mean to imply you didn’t know the difference and thus suggesting you were using the wrong term. I’ve reread my post and I worded it quite badly, sorry about that.

Also, I didn’t want to presume you knew what reentrant was since you didn’t mention it. (And for the sake of our viewing audience)! ;)

Good luck with your library! (Reentrant for the win?)

@Adam, no worries, no offense taken. I honestly appreciated your comment. I’m anything but an expert on threading an reentrancy, and your comment made me stop and read up a bit more to be sure that, yes, I really did mean thread safety. :)

So in plainer terms, Net::SSH is capable of being used in multiple threads, as long as the same instance isn’t used from more than one thread, right?

That’s all I, personally, would need. Having multiple, simultaneous connections to many servers is important to me. But I’ve no need to pass them among threads.

Paul

@Paul, exactly.

My use case is the same as Paul’s, I need to have threads of execution but each thread will have its own net/ssh object to a unique server. is this available yet?