![]() |
I first met Jim “Decibel” Nasby at Nuclear Tacos. Jim was one of the early members of distributed.net. He was also the one who introduced Whurley and I to each other (”You guys need to trade phonebooks.”) Jim was an early adopter of, and is considered a top expert in, Postgres. Following his announcement of a Austin Postgres User Group, I interviewed him on the state of Postgres and the job outlook for Postgres pros. The Austin Postgres User Group will be meeting Tuesday, May 6th, at Sun Microsystems. More details here. |
Lynn Bender: A few years back, I searched Dice/Monster and found only three job postings in all of Texas which contained the keyword Postgres or PostreSQL. what is the job market for Postgres experts in Austin nowdays?
Jim Nasby: I see two distinct markets for people with knowledge in Postgres; there are companies that (generally) past the startup stage, have money to spend, and want someone that has a substantial amount of Postgres and database expertise. There are very few people in the world that have that skill set, so demand is very high relative to supply. This is a market that experts in other databases should pay attention to.
The other market is companies that are using Postgres, but don’t think they need an expert. This is where you will find people who don’t do database work for a living, but have experience with Postgres. In cases like this, having Postgres on your resume will give you a decided edge over a similar candidate that doesn’t have Postgres experience.
Something else that I’ve noticed is that a lot of Postgres jobs don’t get listed through channels like Dice or Monster. A lot more of this hiring seems to be via word-of-mouth. So I don’t think low numbers on Dice or Monster are very representative. One data point that I can throw out is that until recently I’ve averaged one serious inquiry from a recruiter per month. This has fallen off in the last 6 months or so, possible due to my reduced involvement with the community.
Bender: Almost every database has a certification program — Oracle, DB2, SQL Server, and even MySQL. Is there any equivalent yet for Postgres. What are some of the ways prospective Postgres DBAs can distinguish themselves?
Nasby: There are many benefits that come from being involved in the Postgres community, and distinguishing yourself as an expert is just one of them. So let me address all of them in one answer.
First, simply reading the mailing lists is a great way to learn about Postgres, as well as good database practices. pgsql-general, pgsql-novice and pgsql-sql are good places to start. Even better than just lurking is participating. Asking questions about things you’re not clear on, as well as answering questions from others when you can.
Second, reading pgsql-hackers is a good way to learn about how the database actually works on a nuts-and-bolts level. This is an important step up… understanding how the database functions takes you to a much higher level than just being able to write queries; you can start understanding how database design must consider how the database actually functions.
Finally, there is the level of actively participating in the development of Postgres. This doesn’t have to be in the form of code, either. There is plenty of room for help with things like documentation and advocacy. You can also contribute at a technical level even if you’re not writing code.
Bender: For someone just starting out with, aside from a knowledge of SQL, what are most critical skills for a Postgres DBA to have? and what is the best way to obtain them?
Nasby: I’m glad that you mentioned “knowledge of SQL”. SQL is nothing more than a language, and a very simple one at that. I think this is a point that many people fail to grasp. Working with databases is very different than working with procedural code; it requires a different mindset. I constantly see SQL that was clearly written by someone who was thinking in terms of “first I do this, then I do this, then I do this”; they were thinking procedurally. Databases aren’t procedural engines, they’re set manipulation engines. In order to actually be proficient with databases you have to think in terms of sets of *data*, not sets of *operations*.
Also, I want to point out that there are 3 different areas of database expertise: administration, development, and architecture. A database administrator (DBA) is generally concerned with keeping production databases up and running. They are similar to systems administrators. Database developers are concerned with writing code that interacts with the database; queries, stored procedures, as well as portions of client-side code. Database architects work with the structure of data in the database, as well as higher-level considerations. Things like table design, materialized views, designing replication environments. Of course, it’s very common for people to wear many of these three hats.
The key to success with a particular database is understanding how it differs from other databases. For Postgres, this means understanding what MVCC is and how Postgres implements it. When you understand that, you’ll understand how critically important vacuuming is. It also means you don’t have to worry very much about lock contention issues, something that can be a huge pain in other databases. It’s also important to understand what most of the different configuration parameters do (there are a few that can generally be ignored). Understanding that means understanding some of how the engine actually works. There’s some good references for this information available.
There are also things that aren’t Postgres-specific. for DBAs, it’s good to understand how the different backup and recovery mechanisms work. For developers and architects, understanding how to think in terms of sets is critical. Understanding of how the engine works is also useful for developers and DBAs; it’s pretty critical for architects. Replication is something that DBAs and architects should also know.
Bender: Senior level folks and decision makers often ask me “Why is Postgres better? Why should I switch from MySQL?” What would you tell these folks?
Nasby: With Postgres, data integrity is the number one driving factor behind everything. Right behind that is stability. Performance is 3rd. MySQL tends to do things “the easy way”. The trade-off here is that Postgres can be seen as being “hard” compared to MySQL, and there is a bit of truth to that. MySQL can be a bit easier at the very beginning. But once you get into a larger environment, those initial differences become meaningless, but data quality issues are huge.
People like to point to MySQL’s customer list and get a warm fuzzy that since so many other well-known companies use MySQL it must be good enough for us. What they fail to understand is that MySQL is a commercial company with a commercial product, and that list is marketing material. The list for Postgres isn’t as big and impressive because the community focuses effort on the database itself and not as much on marketing.
So, if you’re starting from scratch, I think there’s a very strong case to be made for Postgres. If you have existing code it becomes a bit trickier, because there is the cost of actually migrating. What I can say is what I’ve seen time and time again, and that’s people who finally bit the bullet and migrated and are much happier, or people who are still on MySQL and hating it, but perceive the pain to be just below the pain of migrating. I’ve yet to find someone who’s regretted migrating to Postgres.
Bender: MySQL seems to have a wide variety of storage engines available. Each seem to be suited to a particular type of task. Is there an equivalent to this in Postgres?
Nasby: I think multiple storage engines is MySQL’s Achilles heel. It sounds great in theory… give people the flexibility to use the right storage engine for the job. But the reality is that the storage technologies are too different to work well in one product. For example, there are serious caveats with transactions in InnoDB because of MyISAM. And you find pieces of MySQL technology that only work with one storage engine.
I think a far better approach is to look at what *database* engine is best suited for a given task, or if a database engine is even the right answer. There are options built on Postgres, like Greenplum’s MPP that scale well into many terabytes for data warehouse applications.
Having said that, there are efforts in the Postgres community to add options that will make it’s storage format useful in a broader range of cases. One example is being able to mark a partition as read-only. That allows for covering index scans. It also potentially allows for stripping out a lot of per-row overhead. There are also different indexing methods being worked on.
Bender: I realize that the Austin Postgres user group has not had it’s first meeting yet, but can you share a bit of your vision for what you would like to see? Programs? Speakers?
Nasby: My personal goal for the user group is to promote Postgres and to provide a resource for people in Austin / central Texas that are using Postgres. Ultimately, I think it’s those users who will determine what the user group looks like.
The Austin Postgres User Group will be meeting Tuesday, May 6th, at Sun Microsystems. More details here.
May 5th, 2008 at 8:46 pm
Good to see Postgres in the news. I think one important note about “X company uses MySQL!” lists is that it should be asked “In what capacity?”. Several of the places I’ve worked have used MySQL, but only for (reducing to quasi-absurd simplicity) read-mostly cache-type layers of the site architecture, i.e. places where queries tend to be simple and well-behaved for MySQL’s cache and where data integrity isn’t really an issue per se (because the authoritative data store is in an engine where data integrity is taken more seriously e.g. Oracle). Not to be unduly snarky but the only place I’ve worked that trusted MySQL with business-crucial data was an abattoir of horrors in terms of software engineering practices in every other respect.
May 5th, 2008 at 9:21 pm
Is there a good, simple tutorial on how to set up a redundant “cluster” of Postgres servers, such that if one blows up the other still has the data, and you can do inserts and updates on all the servers ?
May 12th, 2008 at 11:54 am
There’s not really a simple tutorial available, but there are companies that do services around postgres and could offer this. I know that EnterpriseDB offers some nice services around building these types of systems.