1. Traditional databases enforce schema during load time.
2. Hive enforces schema during read time.
Better type safety and data cleansing done for the data at rest
Typically more efficient (storage size and computationally) since the data is already parsed
You have to plan ahead of time what your schema is before you store the data (i.e., you have to do ETL)
Typically you throw away the original data, which could be bad if you have a bug in your ingest process
It's harder to have different views of the same data
Flexibility in defining how your data is interpreted at load time
This gives you the ability to evolve your "schema" as time goes on
This allows you to have different versions of your "schema"
This allows the original source data format to change without having to consolidate to one data format
You get to keep your original data
You can load your data before you know what to do with it (so you don't drop it on the ground)
Gives you flexibility in being able to store unstructured, unclean, and/or unorganized data
Generally it is less efficient because you have to reparse and reinterpret the data every time (this can be expensive with formats like XML)
The data is not self-documenting (i.e., you can't look at a schema to figure out what the data is)
More error prone and your analytics have to account for dirty data