[ZT]Our Go is fine but our SQL is great
博客专区 > taodongl 的博客 > 博客详情
[ZT]Our Go is fine but our SQL is great
taodongl 发表于4个月前
[ZT]Our Go is fine but our SQL is great
  • 发表于 4个月前
  • 阅读 2
  • 收藏 0
  • 点赞 0
  • 评论 0

标题:腾讯云 新注册用户域名抢购1元起>>>   

From: medium

​​A few months ago fat wrote about going to hell and back re-writing Bumpers in React. To round out our thought leading, I’m going to talk about the Bumpers backend. We haven’t done anything quite as dramatic as a re-write but we’ve been learning how to do things The Go Way and it’s affected our thinking to the point that it feels worth writing down. So here goes.

Like Jacob I also hate frameworks, I hate not using a framework, and I hate rolling my own. On the flip side I love writing SQL (which is apparently not a normal thing to love and a preview of the rest of this post).

Preface

The bumpers backend is a Go HTTP server. It runs on Heroku and it’s mostly an API server, but it has a few routes that serve up our sweet frontend. The Bumpers API exposes some of the data in our Postgres database to Bumpers clients. Really shocking for an API, but this is the world we live in now.

We’ve tried a couple different approaches to talking to that database — we’ve settled on using a couple Repository interfaces and hiding all our DB access behind them.

type EpisodeRepo interface {
  Create(*api.Episode) (*api.Episode, error)
  GetByID(string) (*api.Episode, error)
  GetByURLToken(string) (*api.Episode, error)
  GetTrending(int, api.Cursor) ([]api.Episode, error)
}

​​Using an interface like this isn’t new or interesting — it’s called a DAO or the Repository Pattern and you can find it in on Martin Fowler’s blog or in the Gang of Four book. You can find a lot more writing on the general idea if you’re into that.

​​​​What is actually unusual about these Repos in the year 2017 is that we don’t have an ORM or any other database library generating SQL for us. We’re writing all our SQL by hand.

Why Would You Even Do That

​​When I say ORM I really mean a library that really tries to own your interaction with the database. The most comprehensive ORMs are things like ActiveRecord and Hibernate but under that loose of a definition you could extend the label all the way down to libraries like Sequel and Korma, even though they’re not trying to own your whole life or even dealing with “objects”. I’m actively not talking about basic things like drivers or connection pools or supporting query variables — please don’t write your own SQL escaping in 2017.

​​People tend to grab one of these libraries for some of the following reasons:​​

  • ​​It’s part of a framework that’s being used. Godspeed.​​
  • ​​The application needs to be database agnostic. This isn’t a real problem.
  • ​​You like $COOL_PROGRAMMING_LANGUAGE more than you like SQL. It’s easier to write, look at, and understand code in this ORM than it is to look at and understand a SQL query.
  • ​​The library is a helpful abstraction. It lets you compose and re-use queries in a way that writing SQL yourself wouldn’t. It makes it easy to deal with data in your language and not SQL Result Rows. It probably helps you develop a little bit faster.

​​We’re not using a framework that comes with an opinion about The Database, so we’re left with two real criteria to evaluate our libraries on.

​​I’m personally extremely into SQL and Jacob and Ian don’t hate it (yet?), so hiding SQL behind Go isn’t a reason by itself. There’s also evidence that applications that try to deny the existence of a Database behind the curtain end up having some pretty gnarly edge cases. For example, if you’re really using ActiveRecord, you end up having to know ActiveRecord so well that you know what SQL it generates and then decide if it’s ok to ship that SQL. I think that completely invalidates the “it’s easier to understand” line of thinking.

​​Ok That Is A Good Reason​

​​So we’re down to deciding whether or not a database library provides a “helpful abstraction”. The value of a good abstraction is HUGE even if you still have to acknowledge what’s going on with your database underneath. I think it’s totally reasonable to have to understand all of the SQL an ActiveRecord query generates and also think that Scopes are a good and useful tool.​​

​​For Bumpers, we looked at what’s currently out there for Go. The state-of-the-art libraries that claim some sort of ORM-style heritage seem to provide struct mapping and some help with simple SELECT, INSERT and UPDATE queries and not much more. Some of them provide ActiveRecord-inspired DSLs for composing query strings (there were a lot of strings) on top of that. When we talked to folks using one or another of those libraries in production we tended to get back an ascii-shrug-man and not much more. Bummer.​​

We did find, and IMMEDIATELY start using sqlx. sqlx does the struct-mapping that the other libraries do, and provides Get and Select functions that handle the boilerplate of doing a db.Query Row or db.Query scanning the results into a struct or slice of structs. With sqlx our query methods all look something like this:

type episodeRepo struct {
  getByID *sqlx.Stmt // this gets prepared when we new up a struct
}

func (e *episodeRepo) GetByID(id string) (*api.Episode, error) {
  var episode api.Episode
  if err := e.getByID.Get(&episode, id); err != nil {
    return nil, convertError(err)
  }  
  return &episode, nil
}

Together that’s about 90% of what we felt like we were getting out of any of the other libraries, and comes with (almost) no overhead of figuring out what’s going on under the covers.

So, we decided to just keep doing the raw-SQL statements thing.

The Other 10%

So we’re now left with the problem that raw SQL is kinda verbose and hard to re-use. We don’t really have a great answer, but we have started organizing our Repo files in a way that helps.

Each one of our repos lives in a file by itself. Those files have the repo interface up top, followed by a bunch of consts defining the SQL queries we’re gonna need, and then a struct that implements the interface.

The file that the CategoryRepo from above lives in looks something like this:

type EpisodeRepo interface {
  Create(episode *api.Episode) (*api.Episode, error)
  GetByID(id string) (*api.Episode, error)
  GetByURLToken(token string) (*api.Episode, error)
  GetTrending(limit int) ([]api.Episode, error)
}

const (
  episodeInsert = `
  INSERT INTO episodes (image_filename, ...)
  VALUES (:image_filename, ...)
  RETURNING episode_id
  `
  episodeSelectBase = `
  SELECT
    episodes.episode_id,
    episodes.created_at,
    episodes.updated_at,
    episodes.image_filename
    -- and so many more
  FROM episodes
  `
  episodeSelectByID       = episodeSelectBase + `WHERE episode_id = $1`
  episodeSelectByURLToken = episodeSelectBase + `WHERE url_token = $1`
  episodeSelectTrending   = categorySelectBase + `ORDER BY trending_score LIMTI $1`
  // and more ...
)

type psqlEpisodeRepo struct {
  create           *sqlx.NamedStmt
  selectByID       *sqlx.Stmt
  selectByURLToken *sqlx.Stmt
  selectTrending   *sqlx.Stmt
}

func (p *psqlEpisodeRepo) Create(input *api.Episode) (*api.Episode, error) {
  // implementation goes here
}

func (p *psqlEpisodeRepo) GetByID(id string) (*api.Episode, error) {
  // implementation goes here
}

func (p *psqlEpisodeRepo) GetByURLToken(token string) (*api.Episode, error) {
  // implementation goes here
}

func (p *psqlEpisodeRepo) GetTrending(limit int) ([]api.Episode, error) {
  // implementation goes here
}

We tend to write a bunch of very similar get methods that all have to do the same post-query transformations on the data we fetch from Postgres, so we usually end up writing a getOne and a getSlice func to help with that. Those funcs do all the fetching of related entities (by calling other repos) and setting computed fields.

They tend to look a little bit like this:

func (p *psqlEpisodeRepo) GetByID(id string) (*api.Episode, error) {
  return p.getOne(p.selectByID, id)
}

func (p *psqlEpisodeRepo) GetByURLToken(token string) (*api.Episode, error) { 
  return p.getOne(p.selectByURLToken, token)
}

func (p *psqlEpisodeRepo) getOne(stmt *sqlx.Stmt, args ...interface{}) (*api.Episode, error) {
  var episode api.Episode
  if err := e.getByID.Get(&episode, id); err != nil {
    return nil, convertError(err)
  } 
    
  // do whatever computation you've gotta do
  
  return &episode, nil  
}

Which is, honestly, pretty great. It gives us one place to look for N+1 queries, missing fields, and all that fun stuff that shows up when we’re looking at bugs.

Ok Where Are We?

The Bad Parts

While these funcs are looking great, the SQL up top is a little bit of a sadder story. The more we’re adding, the noisier the file gets. We’ve been composing consts (and only ever consts, no runtime SQL changes here) to reduce the noise.

Episodes are pretty chill, but we have quite a few different ways of fetching Users and things in UserRepo can get pretty crazy once you start adding in bulk selects.

const (
  userSelectBase = `SELECT select * FROM users`
  
  userSelectByID          = userSelectBase + ` WHERE user_id = $1`
  userSelectByURLToken    = userSelectBase + ` WHERE url_token = $1`
  userSelectByVanityURL   = userSelectBase + ` WHERE url_vanity = $1`
  userSelectByUsername    = userSelectBase + ` WHERE username = $1`
  userSelectAllByID       = userSelectBase + ` WHERE user_id IN (?)`
  userSelectAllByURLToken = userSelectBase + ` WHERE url_token IN (?)`
)

There’s a lot of noise there. Preparing all our statements in new-struct funcs means that finding the query for a statement generally involves doing a go-to-definition on a statement, finding it in the new-repo func, and then doing a go-to-definition on a constant just to get back to our composed consts. Naming conventions for the consts and statements help a lot, but it’s still a little gnarly. We’ve considered moving the queries into their own file whatever_queries.go but haven’t tried that yet.

Unsurprisingly, relationships between structs/tables are hard. We’re currently enforcing that a EpisodeRepo is the only repo that is allowed to return an Episode and a UserRepo is the only repo allowed to return a User. We end up with Repo implementations that that have other Repos as fields, and where the referenced repo knows about a join table.

Whew that’s gnarly prose. Here’s an example: an Episode has multiple Authors and a UserRepo has to know about our episode_authors join table:

func (e *episodeRepo) getOne(stmt *sqlx.Stmt, args ...interface{}) (api.Episode, error) {
  episode := api.Episode{}
  if err := query.Get(&episode, args...); err != nil {
    return nil, convertError(err)
  }

  authors, err := e.userRepo.GetEpisodeAuthors(e.EpisodeID)
  if err != nil {
    return nil, err /* already converted */
  }
  episode.Authors = authors

  return episode, nil
}

It works well enough that we avoid having any real fun N+1 query situations. It does mean that every time we want to grab an Episode we’re fetching every author for it. This is fine for now, and we can get away with it at our current data volume, but that may not always be the case.

Finally, since we’re writing our own SQL queries for all those Get methods, we write tests for all of these funcs that would come for free with an ORM. That definitely feels like it should count towards our lines of boilerplate-per-Repo.

The Good Parts

While there’s still an amount of boilerplate in this code, the end result is still readable. It hasn’t been a crazy thing for us to drop in and change our Repos pretty significantly. This has held true when our repos have two dozen methods, which is crazy. The most painful part has been the up-front creation of new Repos. Trading a few minutes of up-front tedium for readability and maintainability seems like a very Go tradeoff, and we’re generally happy to optimize for operator happiness over developer happiness. So, we’re fine with that.

Not having to guess what an ORM is doing is another operator happiness thing — since we’re writing queries by hand, we have to think up front about what indexes a query is going to use, whether or not a join is faster than two queries, or if a SELECT FOR UPDATE is locking rows that it doesn’t need to. Running EXPLAIN ANALYZE on a query and also having code review cover SQL is amazing, and all that explicit thinking about SQL really sets us up to think about things like “transactions” and “data integrity”. Ultimately, we’re writing better SQL than we would be otherwise because we have to think about it.

And last but not least, it means that we’ve got full access to the dope features in newer versions of Postgres. Sometimes this is maybe a little too much Shiny New Toy, but jsonb column types and INSERT… ON CONFLICT DO NOTHING have already been awesome. Given that we’re doing code-review for SQL, it’s not crazy to use those things and there’s no weird impedance mismatch between an ORM’s abstraction and those features.

THE END

So that’s where we ended up. We’re happy enough with our current situation, but still keeping our ears open for the next cool thing to come along. I occasionally space out on the train and think about what something like a Go take on HugSQL would look like, so if you’re working on that DEFINITELY let me know. If you’ve got questions or think we’re absolutely insane or had a similar experience, leave a comment or a note and I’ll do my best to reply.

共有 人打赏支持
粉丝 1
博文 2
码字总数 0
评论 (0)
×
taodongl
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
打赏人
留言
* 支付类型
微信扫码支付
打赏金额:
已支付成功
打赏金额: