文档章节

2000个软件开发领域的高频特殊词及精选例句(一)

杨尚川
 杨尚川
发布于 2015/03/19 22:15
字数 75330
阅读 693
收藏 2

superword是一个Java实现的英文单词分析软件,主要研究英语单词音近形似转化规律、前缀后缀规律、词之间的相似性规律等等。

1、单词 hadoop 的匹配文本:


  1. Subash D'Souza is a professional software developer with strong expertise in crunching big data using Hadoop/HBase with Hive/Pig. Apache Flume Distributed Log Collection for Hadoop


  2. The growing pains you’re facing have been faced by other people and there are systems to handle it: Hadoop and HBase. Monitoring with Ganglia


  3. BigTable, Hadoop, SimpleDB, and memcached restrict object lookups by key for exactly these reasons. CouchDB The Definitive Guide


  4. The success of MR led to a wide range of third-party implementations (notably open-source Hadoop [15]), and a number of hybrid systems that combine parallel DBMSs with MR, offered by vendors like Aster, Cloudera, Greenplum, and Vertica. Dremel - Interactive Analysis of WebScale Datasets


  5. As such, Hadoop has similar limitations as MapReduce when developing multi-stage pipelines. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  6. Edward Capriolo is a member of the Apache Software Foundation and a committer for the Hadoop-Hive project. Cassandra High Performance Cookbook


  7. This sort of dependency is similar in principle to the compile-time dependency you might have on any API you are programming, like what you might see if you are writing MapReduce jobs for execution on a Hadoop cluster or writing enterprise integration code with Apache Camel. Gradle Beyond The Basics


  8. At Facebook, Hadoop has traditionally been used in conjunction with Hive for storage and analysis of large data sets. Apache Hadoop Goes Realtime at Facebook


  9. Chapter 12, Integrating Hadoop In this chapter, written by Jeremy Hanna, we put Cassandra in a larger context and see how to integrate it with the popular implementation of Google’s Map/Reduce algorithm, Hadoop. Cassandra The Definitive Guide


  10. Outside Google, projects such as Apache Hadoop have been gaining popularity rapidly. Tenzing - A SQL Implementation On The MapReduce Framework



2、单词 http 的匹配文本:


  1. To have full digital access to this book and others on similar topics from O’Reilly and other publishers, sign up for free at http://my.safaribooksonline.com. Getting Started with CouchDB


  2. That’s a promise! So make sure to visit the book’s website at http://books.couchdb.org/relax to keep up-to-date. CouchDB The Definitive Guide


  3. Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. Cassandra High Performance Cookbook


  4. Ceph is licensed under the LGPL and is available at http://ceph.sourceforge.net/ Ceph A Scalable, High-Performance Distributed File System


  5. For information on translations, please e-mail info@apress.com, or visit http://www.apress.com. Beginning CouchDB


  6. For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com. Scaling CouchDB


  7. It allows you to send data over HTTP in JSON to index, search, and manage your Elasticsearch cluster. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. A request to Elasticsearch consists of the same parts as any HTTP request. Elasticsearch The Definitive Guide


  9. Finding Out More If you’d like to find out more about Cassandra, and to get the latest updates, visit this book’s companion website at http://www.cassandraguide.com. Cassandra The Definitive Guide


  10. It discusses ActiveMQ URIs and all the transport connectors for both client-to-broker and broker-to-broker communications including TCP, NIO, STOMP, failover, SSL, HTTP/S, and much more. ActiveMQ in Action



3、单词 api 的匹配文本:


  1. An HTTP-based REST API makes communicating with the database easier, because so many modern environments are capable of talking HTTP. Getting Started with CouchDB


  2. The CouchDB API is designed to provide a convenient but thin wrapper around the database core. CouchDB The Definitive Guide


  3. In addition to this, the book you are holding will guide you through ElasticSearch Java API, showing how to use it, not only when it comes to CRUD operations but also when it comes to cluster and indices maintenance and manipulation. Mastering ElasticSearch


  4. The CLI is a helpful interactive facade on top of the Cassandra API. Cassandra High Performance Cookbook


  5. Another breakthrough came in April 2006 when it was announced that CouchDB would be solely accessible via an HTTP-based RESTful API. Beginning CouchDB


  6. Since CouchDB uses an HTTP API, standard HTTP load balancing software or hardware can be used. Scaling CouchDB


  7. Even though Elasticsearch is written in Java, there's more than just a Java API that lets you work with it. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. More information about the Java clients can be found in the Java API section of the Guide. Elasticsearch The Definitive Guide


  9. We present an overview of what’s different about Cassandra’s model for querying and updating data, and then get to work using the API. Cassandra The Definitive Guide


  10. Heterogeneous application integration—The ActiveMQ broker is written using the Java language, so naturally a Java client API is provided. ActiveMQ in Action



4、单词 xml 的匹配文本:


  1. So what do you do if your data contains a newline character, like some XML? Using a sequence file can solve this problem because it uses non-printable characters for delimiters. Apache Flume Distributed Log Collection for Hadoop


  2. Shows A show converts a single document into another format, usually HTML, although you can output the document in any format, including JSON or XML, to suit your. Getting Started with CouchDB


  3. An XML-format dump of the entire cluster state can be requested by a remote poller from any single node in the cluster on port 8649 Monitoring with Ganglia


  4. The design doc fields show and list contain functions used to transform raw JSON into HTML, XML, or other Content-Types. CouchDB The Definitive Guide


  5. Many commercial DBMSs support storage of nested data using XML (e.g., [19]) Dremel - Interactive Analysis of WebScale Datasets


  6. In the XML world he is known as the developer of DB Generator for the Apache Cocoon project, the open source projects DBPrism and DBPrism CMS, the Lucene-Oracle integration by using Oracle JVM Directory implementation and in the Restlet.org project the Oracle XDB Restlet Adapter, an alternative to write native REST web services inside the database-resident JVM. Mastering ElasticSearch


  7. You can use the preceding build.xml with the Free-Form project type. Cassandra High Performance Cookbook


  8. It is simple to map data from an existing database or XML to a JSON structure. Beginning CouchDB


  9. This might include reducing or reorganizing joins, throwing out resource-intensive features such as XML processing within a stored procedure, and so forth. Cassandra The Definitive Guide


  10. ActiveMQ also supports the use of Apache Camel within the broker’s XML configuration file. ActiveMQ in Action



5、单词 solr 的匹配文本:


  1. SOLr: a rich Set of Interfaces to Lucene The SOLR project, http://lucene.apache.org/solr/, is a stand-alone, enterprise-grade search service built on top of Lucene. Pro Hadoop


  2. He is also a co-founder of the solr.pl site which publishes information and tutorials about Solr and Lucene library and is the co-author of the ElasticSearch Server book published by Packt Publishing. Mastering ElasticSearch


  3. Solr is a popular frontend that provides a web service for Lucene as well as caching warming and other advanced capabilities. Cassandra High Performance Cookbook


  4. The name is a unique label for the module itself, and is often the same as the project name (like solr-core, spring-web, or junit) Gradle Beyond The Basics


  5. If you've heard about Lucene before, you've probably also heard about Apache Solr, which is also an open-source, distributed search engine based on it. Elasticsearch in Action MEAP V07 cHAPTER 1


  6. In this chapter we'll be using Apache Lucene-based Apache Solr and ElasticIndex for indexing purposes. Hadoop MapReduce Cookbook


  7. If the reducer decides that all of the queue-ids for a message-id create a complete message path, then the message is serialized and queued for the SolrOutputFormat. Hadoop:The Definitive Guide 3rd Edition


  8. Common choices are the Apache Lucene-based solutions, such as Lucene itself, or Solr, a high-performance enterprise search server.‡ Similar to the indexing solutions, there are a few possible approaches: HBase:The Definitive Guide


  9. Comprehensive information on Apache Solr 3 with examples and tips so you can focus on the important parts. Apache Kafka


  10. Integration with Apache Lucene/Solr is available to store external indexes and perform fast global searches. Cassandra The Definitive Guide



6、单词 hbase 的匹配文本:


  1. Subash D'Souza is a professional software developer with strong expertise in crunching big data using Hadoop/HBase with Hive/Pig. Apache Flume Distributed Log Collection for Hadoop


  2. The growing pains you’re facing have been faced by other people and there are systems to handle it: Hadoop and HBase. Monitoring with Ganglia


  3. The subsystems should transfer data from different data sources to Hadoop-compatible data storage systems such as HDFS and HBase. Hadoop Operations and Cluster Management Cookbook


  4. Hoya is a Java-tool designed to leverage YARN to spin up dynamic HBase clusters[21] on demand. Apache Hadoop YARN Yet Another Resource Negotiator


  5. Installing HBase HBase is a highly scalable NoSQL data store that supports columnar-style data storage. Hadoop MapReduce Cookbook


  6. A single master node runs the master HDFS, MapReduce, and HBase daemons. Hadoop in Practice


  7. Given our high confidence in the reliability and scalability of HDFS, we began to explore Hadoop and HBase for such applications. Apache Hadoop Goes Realtime at Facebook


  8. ZooKeeper is often a major component in other Hadoop-related projects, such as HBase   and Katta. Hadoop in Action


  9. Graph databases such as Neo4J and the set of databases derived at least in part from the design of Google’s Bigtable database (such as MongoDB, HBase, Hypertable, and Redis) all are focused slightly less on Availability and more on ensuring Consistency and Partition Tolerance. Cassandra The Definitive Guide


  10. Another member of the broader Hadoop family is HBase, an open-source implementation of another Google technology. Hadoop Beginners Guide



7、单词 hdfs 的匹配文本:


  1. After that we’ll deep-dive into the details on many of the more useful Flume components, including the very important File Channel for persistence of in-flight data records and the HDFS Sink for buffering and writing data into HDFS, the Hadoop Distributed File System. Apache Flume Distributed Log Collection for Hadoop


  2. This file is organized into different contexts: jvm, rpc, hdfs, mapred, and hbase. Monitoring with Ganglia


  3. As Hadoop’s fault tolerance improved, persistent HDFS clusters became the norm. Apache Hadoop YARN Yet Another Resource Negotiator


  4. Chapter 11, Hadoop and Cassanda: Hadoop is a distributed file system, HDFS that provides high throughput and redundant storage and MapReduce, a software framework for distributed processing of large data sets on compute clusters. Cassandra High Performance Cookbook


  5. Chapter 2, Advanced HDFS, introduces a set of advanced HDFS operations that would be useful when performing large-scale data processing with Hadoop MapReduce as well as with non-MapReduce use cases. Hadoop MapReduce Cookbook


  6. After you’re used to thinking in MapReduce, the next challenge is typically related to the logistics of working with Hadoop, such as how to move data in and out of HDFS, and effective and efficient ways to work with data in Hadoop. Hadoop in Practice


  7. As such, there has been less emphasis on making Hadoop performant for random access workloads by providing low latency access to HDFS. Apache Hadoop Goes Realtime at Facebook


  8. An appendix contains a listing of HDFS commands along with their descriptions and usage. Hadoop in Action


  9. Because Hadoop has some unfamiliar terminology, here are some useful definitions: HDFS. Cassandra The Definitive Guide


  10. Parts of Hadoop The top-level Hadoop project has many component subprojects, several of which we'll discuss in this book, but the two main ones are Hadoop Distributed File System (HDFS) and MapReduce. Hadoop Beginners Guide



8、单词 mysql 的匹配文本:


  1. He has worked with Perl/ PHP/Python, primarily for coding and MySQL/Oracle as the backend, for several years prior to moving into Hadoop fulltime. Apache Flume Distributed Log Collection for Hadoop


  2. Formerly as a technical writer, he spent time building both the documentation system and writing content for MySQL and the MySQL groups within Sun and then Oracle. Getting Started with CouchDB


  3. You should have PHP support for MySQL installed on your gweb server before attempting to configure this support. Monitoring with Ganglia


  4. We've moved a large and critical application suite from MySQL to F1 F1 - The Fault-Tolerant Distributed RDBMS


  5. FlumeJava’s parallel collections abstract away the details of how data is represented, including whether the data is represented as an in-memory data structure, as one or more files, or as an external storage service such as a MySql database or a Bigtable [5] FlumeJava - Easy, Efficient Data-Parallel Pipelines


  6. About Packt Publishing Packt, pronounced 'packed', published its first book "Mastering phpMyAdmin for Effective MySQL Management" in April 2004 and subsequently continued to specialize in publishing highly focused books on specific technologies and solutions. Cassandra High Performance Cookbook


  7. The whole process involved running Map/Reduce[7] jobs against the MySQL data files, indexing them and then storing the reverse-index in Cassandra. Cassandra A Decentralized Structured Storage System


  8. For several decades now, relational database management systems have reigned supreme in application and database development, with the likes of Oracle, SQL Server, and MySQL being used in every type of software application imaginable. Beginning CouchDB


  9. Once you start to scale many other data stores (MySQL, Bigtable), some nodes need to be set up as masters in order to organize other nodes, which are set up as slaves. Cassandra The Definitive Guide


  10. Here’s an example of an ActiveMQ configuration for the JDBC message store using the MySQL database: ActiveMQ in Action



9、单词 apache 的匹配文本:


  1. Then I was introduced to Apache CouchDB, and I rediscovered the passion I had when developing applications on the Electron. Getting Started with CouchDB


  2. Apache CouchDB is one of a new breed of database management systems. CouchDB The Definitive Guide


  3. Nevertheless, I hope that by reading this book you'll be able to easily get into all the details about ElasticSearch and underlying Apache Lucene, and I also hope that it will let you get the desired knowledge easier and faster. Mastering ElasticSearch


  4. Over 150 recipes to design and optimize large-scale Apache Cassandra deployments. Cassandra High Performance Cookbook


  5. Apache CouchDB is an exciting database management system that is steadily improving with each new version. Beginning CouchDB


  6. Instead, we will focus on installing and configuring the Apache HTTP Server as a load balancer. Scaling CouchDB


  7. It's open-source, distributed and it's built on top of Apache Lucene, an open-source search engine library, which allows you to implement search functionality in your own Java application. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. It did not come out from any of the Apache releases yet. Namenode High Availability


  9. Cassandra was accepted into the Apache Incubator, and by the time it graduated in March 2010, it had become a true open source success story, with committers from Rackspace, Digg, Twitter, and other companies that wouldn’t have written their own database from scratch, but together built something important. Cassandra The Definitive Guide


  10. Chapter 11 presents advanced features provided by ActiveMQ such as wildcards and composite destinations, advisory messages, virtual topics, some info about ActiveMQ plug-ins, and an introduction to message routing with Apache Camel. ActiveMQ in Action



10、单词 gradle 的匹配文本:


  1. Gradle has already amassed a large audience, and we’ve seen tremendous growth over the last two years, but we’re getting ready for a still larger influx of end-users. Gradle in Action


  2. The conference is visited by 1000 Java developers and he got the chance to educate some of them about the greatness of Gradle and Groovy. Gradle Effective Implementation Guide


  3. The build tool you are going to use in this book is Gradle (http://gradle.org) Pro Spring MVC


  4. His current topics of research center around project automation: build tools (Maven, Leiningen, Gradle), distributed version control (Git), Continuous Integration (Jenkins) and Quality Metrics (Sonar) Jenkins The Definitive Guide


  5. Chances are you might be using it already or have switched to a more modern system such as Maven or Gradle. Sonar Code Quality Testing Essentials


  6. The first thing that is worth mentioning is that the current versions of Spring Security use Gradle as the build tool. Pro Spring Security


  7. Gradle Beyond the Basics, the image of a Belgian shepherd dog, and related trade dress are trademarks of O’Reilly Media, Inc. Gradle Beyond The Basics


  8. It is straightforward to use another build tool, such as Gradle or Ant with Ivy, as long as you use the same set of dependencies defined here.) For building MapReduce jobs you only need to have the hadoop-core dependency, which contains all the Hadoop classes. Hadoop:The Definitive Guide 3rd Edition


  9. Additionally, you will need to have Maven or Gradle installed to pull in the dependencies required to work with the available sample code. Getting Started with Google Guava


  10. Building and Testing with Gradle, the image of a bush wren, and related trade dress are trademarks of O’Reilly Media, Inc. Building and Testing with Gradle



11、单词 url 的匹配文本:


  1. Depending on your platform and installation type, your browser may have already been opened with the correct link; if not, you can access Futon, the built-in administration interface to CouchDB by going to the URL: http://127.0.0.1:5984/_utils/index.html Futon is a web-based interface to the main functionality in CouchDB and provides support for editing the configuration information, creating databases, documents, design documents (and therefore views, shows and lists) and starting and requesting replication. Getting Started with CouchDB


  2. To use it, invoke the URL along with key/value pairs that define events. Monitoring with Ganglia


  3. It is followed by the URL that specifies your CouchDB IP address and port. CouchDB The Definitive Guide


  4. ElasticSearch assumes that data is sent in the URL, or as the request body as JSON document (http://en.wikipedia.org/wiki/JSON) Mastering ElasticSearch


  5. Open your favorite web browser, and navigate to the URL http://127.0.0.1:5984/_utils. Beginning CouchDB


  6. Send an If-None-Match HTTP header containing the Etag’s value with subsequent requests to the same URL (Etags are only valid for a given URL) Scaling CouchDB


  7. This method is often referred to as a query string search, since we pass the search as a URL query string parameter: Elasticsearch The Definitive Guide


  8. Next, go to the GitHub page hosting the project you’re interested in and find the project’s Git URL. Cassandra The Definitive Guide


  9. If you don’t have the Sun J2SE installed and you’re using Linux, Solaris, or Windows, download and install it from the following URL: http://www.oracle.com/technetwork/java/javase/downloads/ index.html. ActiveMQ in Action


  10. It provides aggregate statistics, such as the number of unique visitors per day and the page views per URL per day, as well as site-tracking reports, such as the percentage of users that made a purchase, given that they earlier viewed a specific page. Bigtable - A Distributed Storage System for Structured Data



12、单词 schema 的匹配文本:


  1. Just like in a database that requires a schema, there are some conventions and field types that you should consider including in your documents. Getting Started with CouchDB


  2. We’ll show you how CouchDB’s schema-free document model is a better fit for common applications, how the built-in query engine is a powerful way to use and process your data, and how CouchDB’s design lends itself to modularization and scalability. CouchDB The Definitive Guide


  3. Each document may have a different set of fields; there is no schema or imposed structure. Mastering ElasticSearch


  4. Chapter 6, Schema Design: The Cassandra data model is designed for storing large amounts of data across many nodes. Cassandra High Performance Cookbook


  5. Although the system supports the notion of multiple tables all deployments have only one table in their schema. Cassandra A Decentralized Structured Storage System


  6. Although the strict schema-based structure these databases adhere to is required in some web applications, such as transaction systems, it is not a good fit for many modern web projects, such as blogs, wikis, and discussion forums, which by their nature are a better fit to a document-oriented database. Beginning CouchDB


  7. CouchDB is a schema-less database, giving you much flexibility in designing the document boundaries for your data. Scaling CouchDB


  8. Every type has its own mapping or schema definition, which defines the data structure for documents of that type, much like the columns in a database table. Elasticsearch The Definitive Guide


  9. It is highly available and offers a schema-free data model. Cassandra The Definitive Guide


  10. The JDBC message store uses a schema consisting of three tables. ActiveMQ in Action



13、单词 metadata 的匹配文本:


  1. The HEAD method returns metadata about the document or other object being accessed within the HTTP headers returned. Getting Started with CouchDB


  2. The actual result is JSON-encoded and contains a little more metadata: CouchDB The Definitive Guide


  3. Recipes cover how the CLI is used to make changes to the metadata such as key spaces, column families, and cache settings. Cassandra High Performance Cookbook


  4. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable object storage devices (OSDs) Ceph A Scalable, High-Performance Distributed File System


  5. The metadata about the ranges a node is responsible is cached locally at each node and in a fault-tolerant manner inside Zookeeper - this way a node that crashes and comes back up knows what ranges it was responsible for. Cassandra A Decentralized Structured Storage System


  6. In addition to data fields, each document includes metadata that is maintained by the CouchDB server itself, such as a revision number and more. Beginning CouchDB


  7. A document can be a blog post with all its metadata, a user with all its metadata, or any other type of data you plan to search for. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. As storage clusters grow to thousands of devices or more, consistent management of data placement, failure detection, and failure recovery places an increasingly large burden on client, controller, or metadata directory nodes, limiting scalability. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.


  9. The following list shows only the commands related to metadata and configuration; there are other commands for getting and setting values that we explore later. Cassandra The Definitive Guide


  10. The headers provide metadata about the message used by both clients and JMS providers. ActiveMQ in Action



14、单词 mongodb 的匹配文本:


  1. Other sinks are available on the Internet that can be used to write data to MongoDB, Cassandra, RabbitMQ, Redis, and just about any other data store you can think of. Apache Flume Distributed Log Collection for Hadoop


  2. Like MongoDB itself, this book is very straightforward and approachable. MongoDB The Definitive Guide 2nd Edition


  3. Hadoop was a logical choice since it a widely used system, but we hope that users of MongoDB’s MapReduce and other MapReduce implementations will be able to extrapolate the examples in this text to their particular system of choice. MapReduce Design Patterns


  4. The goal of Drill is to create an interactive analysis platform for Big Data using a standard SQL-supporting relational database management system (RDBMS), Hadoop, and other NoSQL implementations (including Cassandra and MongoDB) Professional Hadoop Solutions


  5. Exporting data from HDFS into MongoDB This recipe will use the MongoOutputFormat class to load data from an HDFS instance into a MongoDB collection. Hadoop RealWorld Solutions Cookbook


  6. You can do that with Elasticsearch, and it works especially well with document-oriented stores like MongoDB, because data is represented in Elasticsearch as documents, too. Elasticsearch in Action MEAP V07 cHAPTER 1


  7. MongoDB also provides auto-sharding capabilities to manage failover and node balancing. Cassandra The Definitive Guide


  8. The format of the row key is an account ID (which is a MongoDB ObjectId) and a content item ID (witha MD5 hash of the URL of the content item being tracked) separated by an underscore which we split on to provide that data in the result set. Programming Hive


  9. That MongoDB also based its query language on JSON lent a high degree of comfort and harmony to the usage of this new database. MongoDB in Action


  10. These systems can also be grouped with the NoSQL databases (MongoDB [27], to name one of many): both offer higher performance and scale better than traditional databases, but provide weaker semantics. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications



15、单词 jvm 的匹配文本:


  1. The downside of using this channel is that an agent failure (hardware problem, power outage, JVM crash, Flume restart, and so on) results in loss of data. Apache Flume Distributed Log Collection for Hadoop


  2. Java Virtual Machine Metrics The sFlow Java Virtual Machine (JVM) metrics (Table 8-3) are based on the metrics exposed through the Java Management Extensions (JMX) interface, ensuring consistency with existing JMX-based monitoring systems. Monitoring with Ganglia


  3. Groovy is a language for the Java Virtual Machine (JVM), such as Java and Scala. Gradle Effective Implementation Guide


  4. In the XML world he is known as the developer of DB Generator for the Apache Cocoon project, the open source projects DBPrism and DBPrism CMS, the Lucene-Oracle integration by using Oracle JVM Directory implementation and in the Restlet.org project the Oracle XDB Restlet Adapter, an alternative to write native REST web services inside the database-resident JVM. Mastering ElasticSearch


  5. In addition to JVM internals, applications can maintain their own counters and provide operations that the user can trigger remotely. Cassandra High Performance Cookbook


  6. The File constructor will create an absolute path out of the supplied parameter, but the constructor argument will be considered as if it is relative to the directory in which the JVM started up.1 This directory may change if you are invoking Gradle directly, through the wrapper, through an IDE, or through integration with a Continuous Integration server. Gradle Beyond The Basics


  7. It would work, but would need a lot of RAM to hold copies of essentially the same data, and would create many more data structures that the JVM would have to spend time garbage collecting. Elasticsearch The Definitive Guide


  8. We pass the file locations to the JVM by pasting the following code at the bottom of this file. Cassandra The Definitive Guide


  9. The SSL cipher suites for the ActiveMQ SSL transport are provided by the JVM. ActiveMQ in Action


  10. Groovy is a dynamic language of the JVM, similar in many respects to Java, but with some important differences. Building and Testing with Gradle



16、单词 plugin 的匹配文本:


  1. While not covered in the Flume documentation, it is common to enable JMX in your Flume JVM (http://bit.ly/javajmx) and use the Nagios JMX plugin (http://bit.ly/nagiosjmx) to alert on performance abnormalities in your Flume agents. Apache Flume Distributed Log Collection for Hadoop


  2. Collectively, these enable Nagios to query metric information from gweb instead of relying on remote execution systems such as Nagios Service Check Acceptor (NSCA) and Nagios Remote Plugin Executor (NRPE) Monitoring with Ganglia


  3. We’ll explore the use of the Java plugin for standardized and nonconventional use cases and examine productivity tools for fast development turnaround. Gradle in Action


  4. We will see how we can decouple the definition and usage of a custom task and plugin into separate source files. Gradle Effective Implementation Guide


  5. Finally, we will take a deep look at ElasticSearch extensions by developing a custom river plugin for data indexing and a custom analysis plugin for data analysis during query and index time. Mastering ElasticSearch


  6. Using the Cassandra maven plugin With maven, it is simple to create a software project that has Cassandra support built in. Cassandra High Performance Cookbook


  7. The base build from which we will derive FileCollection examples apply plugin: 'java' Gradle Beyond The Basics


  8. BookKeeper Journal Manager (BKJM) is NameNode plugin implementation, involves BK client to read/write to/from BK cluster. Namenode High Availability


  9. The Consumer is executed using the maven-exec-plugin (http://mng.bz/bf7g) by passing it some system properties as arguments using the exec.args property. ActiveMQ in Action


  10. In Gradle, a plugin may provide configurable build actions by introducing one or more tasks, but it is fundamentally a means of extending the build DSL to include the domain encompassed by the plug-in. Building and Testing with Gradle



17、单词 sql 的匹配文本:


  1. He also has experience optimizing SQL queries for database interactions. Apache Flume Distributed Log Collection for Hadoop


  2. By working with the grain and letting CouchDB do this for us, we save ourselves a tremendous amount of CPU cycles that would otherwise have been spent serializing object graphs from SQL, converting them into domain objects, and using those objects to do application-level validation. CouchDB The Definitive Guide


  3. Second, Dremel provides a high-level, SQL-like language to express ad hoc queries. Dremel - Interactive Analysis of WebScale Datasets


  4. Using analogies from the SQL world, index is something similar to a database. Mastering ElasticSearch


  5. Her interests and experiences lie across multiple tiers, from web frontend GUI to middleware, from middleware to backend SQL RDBMS and NoSQL data storage. Cassandra High Performance Cookbook


  6. He introduced the concept of JavaScript frameworks and Ajax to the company’s web division in 2008, developing a PL/SQL and JavaScript framework that is now used in every Core web product. Beginning CouchDB


  7. You can deploy a search engine on top of a relational database, to create indexes and speed up the SQL queries. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. However, our core data model is not restricted to our particular RPC framework; we also trace activities such as SMTP sessions in Gmail, HTTP requests from the outside world, and outbound queries to SQL servers. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  9. It is much like the INSERT keyword in SQL except that, if the document already exists, then the new document would replace the old. Elasticsearch The Definitive Guide


  10. An important one is the Structured Query Language (SQL), which is feature-rich and uses a simple, declarative syntax. Cassandra The Definitive Guide



18、单词 implementations 的匹配文本:


  1. Some custom implementations are covered should your needs necessitate it. Apache Flume Distributed Log Collection for Hadoop


  2. Implementations In addition to gmetric, there are several other metric-generating utilities that are available. Monitoring with Ganglia


  3. This header is often useful in web development when there are known errors in client implementations that a server might want to prepare the response for. CouchDB The Definitive Guide


  4. The success of MR led to a wide range of third-party implementations (notably open-source Hadoop [15]), and a number of hybrid systems that combine parallel DBMSs with MR, offered by vendors like Aster, Cloudera, Greenplum, and Vertica. Dremel - Interactive Analysis of WebScale Datasets


  5. We learned how to choose the right Lucene Directory implementation and which of the available implementations is the right choice in our environment. Mastering ElasticSearch


  6. Other JVM implementations and ports exist for other operating systems. Cassandra High Performance Cookbook


  7. This behavior is explicitly captured by the readdirplus [31] extension, which returns lstat results with directory entries (as some OS-specific implementations of getdir already do) Ceph A Scalable, High-Performance Distributed File System


  8. Hence we do not need to deal with or handle the concurrency issues that exist in B-Tree based database implementations. Cassandra A Decentralized Structured Storage System


  9. The strategy is outlined in the common abstract class, allowing different implementations of an algorithm (different strategies for accomplishing the same work) Cassandra The Definitive Guide


  10. The JMS API minimizes the amount of enterprise messaging knowledge a Java programmer is required to possess in order to develop complex messaging applications, while still maintaining a certain amount of portability across JMS provider implementations. ActiveMQ in Action



19、单词 osgi 的匹配文本:


  1. Preventing any kind of dependencies between the modules allows the new module systems, such as OSGi, to dynamically reload module B without reloading module A so long as the appropriate framework hooks are in place and all communications between the modules A and B happen via the core-api module. Scala in Depth


  2. Gradle provides support for Java, Groovy, Scala, Web, and OSGi projects, out of the box. Gradle Effective Implementation Guide


  3. However, SpringSource also hosts its own Maven repository and provides Spring libraries in the form of Enterprise Bundle Repositories (EBRs), which are OSGi compatible. Pro Spring 3


  4. Imagine being able to replace the PDF parser with a commercially licensed alternative, upgrade the plain text parser for a version with the latest bug fixes, and add a new parser for a custom file format without having to restart or even reload any other components of the system! This is what the OSGi framework makes possible. Tika in Action


  5. Chapter 4 says more about this topic, but for now, you just need to know that Spring Batch can run anywhere the Spring Framework can run: in its own Java process, in a web container, in an application, or even in an Open Services Gateway initiative (OSGi) container. Spring Batch in Action


  6. However, I’d noticed the enterprise world’s interest in OSGi, so I too was interested. Spring Dynamic Modules in Action


  7. The application server is easy to use (ZIP installer and NetBeans/ Eclipse/IntelliJ integration), lightweight (downloads starting at 37 MB, small disk/ memory footprint), and modular (OSGi-based, containers start on demand) Java EE 7 Essentials


  8. This provides a series of architectural advantages, such as an OSGi-based component model and being able to run JRockit Mission Control both as a standalone application and inside the Eclipse IDE. Oracle JRockit The Definitive Guide


  9. Chapter 17 presents the main principles behind the Open Services Gateway initiative (OSGi) and demonstrates how its approach to modularity and its service registry can be utilized in Spring Integration applications. Spring Integration in Action


  10. Because add-ons are OSGi components, we spend time detailing enough of OSGi to be dangerous, and then we dive right in and create three add-ons: a Norwegian language addon, a Roo wrapper add-on to expose a non-OSGi JAR to the Roo system, and a “Simple” add-on to provide jQuery support. Spring Roo in Action



20、单词 dependencies 的匹配文本:


  1. To install CouchDB for the first time, including any dependencies: Getting Started with CouchDB


  2. For other versions of Mac OS X, the dependencies might vary. Monitoring with Ganglia


  3. The appeal of this approach is that CouchDB can serve the whole application without dependencies on a complex environment such as might be maintained on a production web server. CouchDB The Definitive Guide


  4. Its core comes as a single file of Java library with no dependencies, and allows you to index documents and search them with its out of the box full text search capabilities. Mastering ElasticSearch


  5. As a result, you do not have to run a list of targets sequentially; the dist target will run its dependents such as compile and init and their dependencies in proper order. Cassandra High Performance Cookbook


  6. Even if your system’s package manager does not include CouchDB itself in its repository, it’s quite likely that it does feature some of Couch’s dependencies (Erlang, ICU, Spidermonkey, libcurl, and so on) Beginning CouchDB


  7. Dapper traces were used to assess the total query cost, and prompted an effort to redesign operations in order to minimize load on their system dependencies. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  8. The use of a separate building ensures both that the backup will survive building damage, and that the backups introduce no cyclic dependencies in the system; a GFS cell in the same building potentially might rely on the Chubby cell for electing its master. Chubby - The lock service for loosely-coupled distributed systems


  9. Because Ivy takes care of all the dependencies, it’s easy to build Cassandra once you have the source. Cassandra The Definitive Guide


  10. Loosely coupled architectures, on the other hand, exhibit fewer dependencies, making them better at handling unforeseen changes. ActiveMQ in Action



21、单词 runtime 的匹配文本:


  1. You will need to include the flume-ng-sdk JAR in the classpath of your Java application at runtime to use Flume's Log4J Appender. Apache Flume Distributed Log Collection for Hadoop


  2. Megastore emphasizes cost-transparent APIs with runtime costs that match application developers' intuitions. Megastore - Providing Scalable, Highly Available Storage for Interactive Services


  3. So rather than reinvent a new type of modular interface, why not just reuse a tried and true interface? Of course, the fact that gmond is built on top of the Apache Portability Runtime (APR) libraries made the Apache way of implementing a modular interface an obvious fit. Monitoring with Ganglia


  4. We use CouchDB’s JavaScript runtime’s built-in toJSON function in our equality test, which is better than testing for raw equality. CouchDB The Definitive Guide


  5. Please remember that the field data cache is not built during indexing, but can be rebuilt while querying and because of that we can change filtering during runtime by updating the fieldata section using the mappings API. Mastering ElasticSearch


  6. Of course, it would be much nicer if you could define the URL and title you want to add to the database at runtime. Beginning CouchDB


  7. Their implementations are connected to the Sawzall runtime and interact with system internals, managing low-level data formats and distributed computations. Interpreting the Data Parallel Analysis with Sawzall


  8. Dapper penetration can be assessed in two dimensions: the fraction of production processes that can generate Dapper traces (i.e., those that are linked with Dapperinstrumented runtime libraries) and the fraction of production machines running Dapper’s trace collection daemon. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  9. It’s often useful to contextualize events at runtime against other events that might be related in order to infer some conclusion to support business decision making. Cassandra The Definitive Guide


  10. The Apache Geronimo application server is a fully certified Java EE 5 runtime that uses many leading open source projects for various aspects of the application server. ActiveMQ in Action



22、单词 jenkins 的匹配文本:


  1. This chapter discusses the installation and configuration procedures needed to run Gradle on Jenkins, an open-source CI server. Gradle in Action


  2. Over 80 recipes to maintain, secure, communicate, test, build, and improve the software development process with Jenkins. Jenkins Continuous Integration Cookbook


  3. Then, we are going to see how the continuous integration servers Jenkins, JetBrains TeamCity, and Atlassian Bamboo support Gradle. Gradle Effective Implementation Guide


  4. Allows you to run SBT build actions from Jenkins and lets you configure SBT using Jenkins. Scala in Action


  5. Seven years ago, I wrote the first line of code that started this whole project that is now known as Jenkins, and was originally called Hudson. Jenkins The Definitive Guide


  6. There are two implementations you can choose from: Jenkins and Murmur. Hadoop in Practice


  7. The encoding of the region names for the on-disk directories is also different: they use a Jenkins hash to encode the region name. HBase:The Definitive Guide


  8. His current topics of research center around project automation, including: build tools (Gradle, Leiningen, Maven, Ant), distributed version control (Git, Mercurial), testing frameworks (Geb, Spock, JUnit, TestNG, Mockito), continuous integration (Jenkins, Hudson, Bamboo) and code quality metrics (Sonar, CodeNarc, PMD) Building and Testing with Gradle


  9. Jenkins Tool used for OpenStack development to run jobs automatically. OpenStack Operations Guide


  10. The first book to include detailed screenshots and recipes for using Jenkins continuous integration server (formerly known as Hudson) Selenium 2 Testing Tools Beginner's Guide



23、单词 couchdb 的匹配文本:


  1. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc., Getting Started with CouchDB, the cover image of a hawk’s bill sea turtle, and related trade dress are trademarks of O’Reilly Media, Inc. Getting Started with CouchDB


  2. CouchDB: The Definitive Guide, the image of a Pomeranian dog, and related trade dress are trademarks of O’Reilly Media, Inc. CouchDB The Definitive Guide


  3. For example, CouchDB uses a combination of counter and content hash. NoSQL Distilled


  4. Apache CouchDB is an exciting database management system that is steadily improving with each new version. Beginning CouchDB


  5. Scaling CouchDB, the image of a chough, and related trade dress are trademarks of O’Reilly Media, Inc. Scaling CouchDB


  6. In fact, some document stores (like CouchDB) have made this a key point of their value proposition: documents can be held in a multimaster database that automatically replicates concurrently accessed, contended state across instances without undue interference from the user. Graph Databases


  7. CouchDB is a document-oriented database—a distributed keyvalue store that encodes values in JSON. SQL Antipatterns


  8. This is not entirely surprising, as its creator, Damien Katz, worked on Lotus Notes at IBM before deciding to embark on this project, with the idea that it would be a database ”of the Web.” The documents stored by CouchDB do not need to share the same schema, and query capabilities are made available via views, which are constructed as JavaScript functions. Cassandra The Definitive Guide


  9. For example, CouchDB (which is generally considered as MongoDB’s biggest “competitor”) doesn’t support dynamic queries. The Definitive Guide to MongoDB


  10. As of this writing, the only well-known document database apart from MongoDB is Apache’s CouchDB. MongoDB in Action



24、单词 cpu 的匹配文本:


  1. By working with the grain and letting CouchDB do this for us, we save ourselves a tremendous amount of CPU cycles that would otherwise have been spent serializing object graphs from SQL, converting them into domain objects, and using those objects to do application-level validation. CouchDB The Definitive Guide


  2. You are only limited by resources such as memory, CPU time, and hard disk space. Cassandra High Performance Cookbook


  3. More recent distributed file systems have adopted architectures based on object-based storage, in which conventional hard disks are replaced with intelligent object storage devices (OSDs) which combine a CPU, network. Ceph A Scalable, High-Performance Distributed File System


  4. The salient feature of Scuttlebutt is that it has very efficient CPU utilization and very efficient utilization of the gossip channel. Cassandra A Decentralized Structured Storage System


  5. This could be through added memory, a faster CPU, or larger hard drives. Scaling CouchDB


  6. Ideal maximum throughput: • In practice, network bandwidth or cpu limits the total capacity in bytes. BookKeeper


  7. OSDs constructed from commodity components combine a CPU, network interface, and local cache with an underlying disk or RAID, and replace the convention block-based storage interface with one based on named, variable-length objects. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.


  8. But with big enough from values, the sorting process can become very heavy indeed, using vast amounts of CPU, memory and bandwidth. Elasticsearch The Definitive Guide


  9. The least disruptive thing (for the programming staff, anyway) is to vertically scale by adding CPU, adding memory, and getting faster hard drives, but if you continue to be successful and add more customers, at some point (perhaps into the tens of millions of rows), you’ll likely have to start thinking about how you can add more machines. Cassandra The Definitive Guide


  10. It’s also advisable to reduce the CPU load per connection. ActiveMQ in Action



25、单词 bytes 的匹配文本:


  1. The body is an array of bytes that contains the actual payload. Apache Flume Distributed Log Collection for Hadoop


  2. For CPU stats, for example, the value represents a percentage, and network stats interpret the value as a raw number of bytes. Monitoring with Ganglia


  3. The Content-Length header simply tells us how many bytes the response body has. CouchDB The Definitive Guide


  4. The number of segments an index will contain is proportional to the logarithm of the next size in bytes. Mastering ElasticSearch


  5. Storing large objects The row data stored in Cassandra is typically smaller in size, between a few bytes to a few thousand bytes. Cassandra High Performance Cookbook


  6. File and directory metadata in Ceph is very small, consisting almost entirely of directory entries (file names) and inodes (80 bytes) Ceph A Scalable, High-Performance Distributed File System


  7. Ideal maximum throughput: • In practice, network bandwidth or cpu limits the total capacity in bytes. BookKeeper


  8. This sorts by directly comparing the bytes, validating that the input can be parsed as US-ASCII. Cassandra The Definitive Guide


  9. Efficiency—Since this connector uses the OpenWire protocol to convert messages to a stream of bytes (and back), it’s very efficient in terms of network usage and performance. ActiveMQ in Action


  10. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Bigtable - A Distributed Storage System for Structured Data



26、单词 lucene 的匹配文本:


  1. Nevertheless, I hope that by reading this book you'll be able to easily get into all the details about ElasticSearch and underlying Apache Lucene, and I also hope that it will let you get the desired knowledge easier and faster. Mastering ElasticSearch


  2. His background includes a wide range of programming languages (including Java and C#), search engines (including Lucene), databases, and NoSQL distributed data stores. Cassandra High Performance Cookbook


  3. It's open-source, distributed and it's built on top of Apache Lucene, an open-source search engine library, which allows you to implement search functionality in your own Java application. Elasticsearch in Action MEAP V07 cHAPTER 1


  4. In this chapter we'll be using Apache Lucene-based Apache Solr and ElasticIndex for indexing purposes. Hadoop MapReduce Cookbook


  5. Your data might be XML files sitting behind a number of FTP servers, text log files sitting on a central web server, or Lucene indexes1 in HDFS. Hadoop in Practice


  6. Elasticsearch uses Lucene internally for all of its indexing and search, but it. Elasticsearch The Definitive Guide


  7. Does not work well with other Lucene constructs, like FieldCache (by doc info) elasticsearch. The Road to a Distributed, (Near) Real Time, Search Engine


  8. Hadoop started out as a subproject of Nutch , which in turn was a subproject of Apache Lucene. Hadoop in Action


  9. Integration with Apache Lucene/Solr is available to store external indexes and perform fast global searches. Cassandra The Definitive Guide


  10. Doug started work on the implementations of these Google systems, and Hadoop was soon born, firstly as a subproject of Lucene and soon was its own top-level project within the Apache open source foundation. Hadoop Beginners Guide



27、单词 redis 的匹配文本:


  1. Other sinks are available on the Internet that can be used to write data to MongoDB, Cassandra, RabbitMQ, Redis, and just about any other data store you can think of. Apache Flume Distributed Log Collection for Hadoop


  2. Redis is good at queue operations, and occasionally you can use memcached for this purpose, too. High Performance MySQL 3rd Edition


  3. Twitter is built atop several open source protocols, primarily HTTP, Thrift, Memcached, MySQL, and Redis. Netty in Action MEAP V08


  4. In a realworld application you’d include the logic for increasing the user points, say on a Redis database. RabbitMQ in Action


  5. This example is a basic means for writing to a number of Redis instances in parallel from MapReduce. MapReduce Design Patterns


  6. For example, Riak allows you to add metadata to aggregates for indexing and interaggregate links, Redis allows you to break down the aggregate into lists or sets. NoSQL Distilled


  7. Key-value stores You can use applications such as Memcached or Redis for external caching. bk-admin-guide-cloud-latest


  8. There is also a recipe that shows how to use Redis to join data from an external data store. Hadoop RealWorld Solutions Cookbook


  9. Graph databases such as Neo4J and the set of databases derived at least in part from the design of Google’s Bigtable database (such as MongoDB, HBase, Hypertable, and Redis) all are focused slightly less on Availability and more on ensuring Consistency and Partition Tolerance. Cassandra The Definitive Guide


  10. Redis Cookbook, the image of the mouse opossum, and related trade dress are trademarks of O’Reilly Media, Inc. Redis Cookbook



28、单词 html 的匹配文本:


  1. I won't cover this here, but if you need it, there is an example in the Flume User Guide (http://flume.apache.org/FlumeUserGuide.html) Apache Flume Distributed Log Collection for Hadoop


  2. Depending on your platform and installation type, your browser may have already been opened with the correct link; if not, you can access Futon, the built-in administration interface to CouchDB by going to the URL: http://127.0.0.1:5984/_utils/index.html Futon is a web-based interface to the main functionality in CouchDB and provides support for editing the configuration information, creating databases, documents, design documents (and therefore views, shows and lists) and starting and requesting replication. Getting Started with CouchDB


  3. The source and target members of our replication request are actually links (like in HTML) and so far we’ve seen links relative to the server we’re working on (hence local) CouchDB The Definitive Guide


  4. One of the examples of character mapper is HTML tags removal process. Mastering ElasticSearch


  5. He dabbled with web development in the late 1990s and was an expert in HTML, CSS, and JavaScript by the time he finished school. Beginning CouchDB


  6. Change into the log directory and generate the HTML and graph reports using the tsung_stats.pl script package with Tsung: Scaling CouchDB


  7. In this book, we'll stick with the popular formats: JSON for HTML communication and YAML for configuration. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. This contains a snippet of text from the about field with the matching words wrapped in  HTML tags: Elasticsearch The Definitive Guide


  9. If you’re interested, you can read the complete Avro specification at http://avro.apache.org/docs/current/spec.html, though it is definitely not required to work with Cassandra. Cassandra The Definitive Guide


  10. If you don’t have the Sun J2SE installed and you’re using Linux, Solaris, or Windows, download and install it from the following URL: http://www.oracle.com/technetwork/java/javase/downloads/ index.html. ActiveMQ in Action



29、单词 metrics 的匹配文本:


  1. This mode is good for things like metrics where gaps can easily be tolerated, as new data is just a second away. Apache Flume Distributed Log Collection for Hadoop


  2. As you can see from the help page, the first version of ganglia allowed you to query and sort by 21 different system metrics right out of the box. Monitoring with Ganglia


  3. Using the JDepend plugin To get quality metrics for our code base, we can use JDepend. Gradle Effective Implementation Guide


  4. We expose various system level metrics to Ganglia and this has helped us understand the behavior of the system when subject to our production workload. Cassandra A Decentralized Structured Storage System


  5. Every key in the response contains the same metrics: current, count, mean, max, min, stddev, and description. Beginning CouchDB


  6. This makes it easy for the owners of shared services to rank their users in terms of various metrics (e.g., inbound network load, outbound network load, or total time spent servicing requests) Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  7. Hector directly supports JMX by exposing metrics such as bad connections, available connections, idle connections, and more. Cassandra The Definitive Guide


  8. As with any other infrastructure software, it’s important for developers and administrators to be able to monitor broker metrics during runtime and notice any suspicious behavior that could possibly impact messaging clients. ActiveMQ in Action


  9. We log various facts and metrics for each of these queries. PowerDrill - Processing a Trillion Cells per Mouse Click


  10. His current topics of research center around project automation, including: build tools (Gradle, Leiningen, Maven, Ant), distributed version control (Git, Mercurial), testing frameworks (Geb, Spock, JUnit, TestNG, Mockito), continuous integration (Jenkins, Hudson, Bamboo) and code quality metrics (Sonar, CodeNarc, PMD) Building and Testing with Gradle



30、单词 cassandra 的匹配文本:


  1. Other sinks are available on the Internet that can be used to write data to MongoDB, Cassandra, RabbitMQ, Redis, and just about any other data store you can think of. Apache Flume Distributed Log Collection for Hadoop


  2. Because of high performance, a small codebase, and compact resulting data, there is a wide adoption of Avro not only in the Hadoop community, but also by many other NoSQL implementations (including Cassandra) Professional Hadoop Solutions


  3. Over 150 recipes to design and optimize large-scale Apache Cassandra deployments. Cassandra High Performance Cookbook


  4. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers) Cassandra A Decentralized Structured Storage System


  5. The company looked at various options, including RDBMS, in-memory counters, and Cassandra, before settling on HBase. HBase in Action


  6. He closely works with Big Data technologies, such as Hadoop and Cassandra daily. Hadoop MapReduce Cookbook


  7. Custom IndexReader and IndexWriter to work on top of Cassandra. elasticsearch. The Road to a Distributed, (Near) Real Time, Search Engine


  8. Cassandra: The Definitive Guide, the image of a Paradise flycatcher, and related trade dress are trademarks of O’Reilly Media, Inc. Cassandra The Definitive Guide


  9. Get the best out of Cassandra using this efficient recipe bank. Hadoop Beginners Guide


  10. Programming Hive introduces Hive, an essential tool in the Hadoop ecosystem that provides an SQL (Structured Query Language) dialect for querying data stored in the Hadoop Distributed Filesystem (HDFS), other filesystems that integrate with Hadoop, such as MapR-FS and Amazon’s S3 and databases like HBase (the Hadoop database) and Cassandra. Programming Hive



31、单词 mapper 的匹配文本:


  1. Usually, each mapper is assigned a single block of a file as input (unless you have used certain compression codecs) Apache Flume Distributed Log Collection for Hadoop


  2. One of the examples of character mapper is HTML tags removal process. Mastering ElasticSearch


  3. The Map phase then invokes a user-defined function, the Mapper, on each element, independently and in parallel. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  4. The mapper takes a column and breaks it into tokens (individual words) using StringTokenizer, a class that splits strings on common tokens such as spaces and columns. Cassandra High Performance Cookbook


  5. The code has three parts—mapper, reducer, and the main program. Hadoop MapReduce Cookbook


  6. The mapper splits the line into distinct words, and outputs each word (the key) along with the word's originating filename (the value) Hadoop in Practice


  7. Each phase is defined by a data processing function, and these functions are called mapper and reducer, respectively. Hadoop in Action


  8. Readers familiar with MapReduce programs will notice how familiar this mapper looks. Cassandra The Definitive Guide


  9. If small enough, the secondary table is pulled into the memory of each mapper / reducer process for inmemory lookups, which typically is the fastest method for joining. Tenzing - A SQL Implementation On The MapReduce Framework


  10. Note that each mapper simply outputs a series of individual key/value outputs; these are combined into a key and list of values in the shuffle method. Hadoop Beginners Guide



32、单词 filesystem 的匹配文本:


  1. The problem with HDFS and streaming data/logs HDFS isn't a real filesystem, at least not in the traditional sense, and many of the things we take for granted with normal filesystems don't apply here, for example being able to mount it. Apache Flume Distributed Log Collection for Hadoop


  2. Change to the actual mount point of a SAN filesystem. Monitoring with Ganglia


  3. For single documents, the request then gets passed to the database module where the data for the document is looked up on the filesystem and returned all the way up again. CouchDB The Definitive Guide


  4. Our access to the cluster was too brief to debug and characterize the middling performance from the 2.1.0 filesystem. Apache Hadoop YARN Yet Another Resource Negotiator


  5. In case of ElasticSearch it is usually better to use the new IO-based system store instead of the simple filesystem store. Mastering ElasticSearch


  6. Most of the software will be installed to various locations throughout the Linux filesystem, so where exactly you download and compile the applications is not important. Beginning CouchDB


  7. For example, a call to fileTree() might scan the filesystem at the time it is called, producing an immutable list that the build can then manipulate. Gradle Beyond The Basics


  8. At startup time, the HDFS NameNode reads filesystem metadata from a file called the fsimage file. Apache Hadoop Goes Realtime at Facebook


  9. This might mean optimizing the channels the database uses to write to the underlying filesystem. Cassandra The Definitive Guide


  10. The colon at the beginning of the project name indicates the root of the project tree, in a similar way that a forward slash indicates the root directory in a Unix filesystem, or a backslash indicates the root directory on Windows. Building and Testing with Gradle



33、单词 json 的匹配文本:


  1. The format is also self-describing using JSON, making for a good long-term data storage format, as your data format may evolve over time. Apache Flume Distributed Log Collection for Hadoop


  2. In many cases, your application can use the JSON object structure directly without having to serialize your objects or data into the fixed format required by the database engine. Getting Started with CouchDB


  3. Case Study Greg Borenstein, a friend and coworker, built a small library for converting Songbird playlists to JSON objects and decided to store these in CouchDB as part of a backup application. CouchDB The Definitive Guide


  4. From the client point of view, document is a JSON object (see more about JSON format at http://en.wikipedia.org/wiki/JSON) Mastering ElasticSearch


  5. Chapter 9, Coding and Internals: This chapter covers programming recipes that go beyond the typical application programmer interface, including building Cassandra from source, creating custom types for use with Cassandra, and modifying tools like the JSON export tools. Cassandra High Performance Cookbook


  6. In the second part of this book, you will learn about Futon, CouchDB’s excellent web-based admin interface, and then be introduced to the JSON format, which is used to describe the data in CouchDB. Beginning CouchDB


  7. Replication can be run from Futon, CouchDB’s web administration console, or by sending a POST request to _replicate containing a JSON object with replication parameters. Scaling CouchDB


  8. It allows you to send data over HTTP in JSON to index, search, and manage your Elasticsearch cluster. Elasticsearch in Action MEAP V07 cHAPTER 1


  9. Any optional query string parameters, eg ?pretty will pretty-print the JSON response to make it easier to read. Elasticsearch The Definitive Guide


  10. It also has scripts for converting SSTables (the datafiles) to JSON and back. Cassandra The Definitive Guide



34、单词 annotations 的匹配文本:


  1. Code annotations accompany many of the code listings and highlight important concepts. Gradle in Action


  2. Gradle scans the test classpath for all class files and checks if they have specific JUnit or TestNG annotations. Gradle Effective Implementation Guide


  3. To eliminate programmer annotations that help identify dependencies, we exploit the structure of MapReduce computations. Incoop MapReduce for Incremental Computations


  4. A plug-in that creates WSDL files from an annotated Java web service implementation should not include its own code for scanning for annotations and generating content in a specialized XML vocabulary, but should instead declare a dependency on an existing library to do that work, and provide a mechanism for that library to be fetched from an online repository if it is not already present on the build system. Gradle Beyond The Basics


  5. In commands and source code, we make extensive use of bold text; and annotations draw your attention to the important aspects of listings. HBase in Action


  6. Our instrumentation is restricted to a low enough level in the software stack that even largescale distributed systems like Google web search could be traced without additional annotations. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  7. Code annotations accompany many of the listings, highlighting important concepts. Hadoop in Action


  8. Kundera (Java ORM) Kundera is an object-relational mapping (ORM) implementation for Cassandra written using Java annotations. Cassandra The Definitive Guide


  9. Components of Hadoop that wish to produce metrics implement the MetricsSource interface or use a set of simple Java annotations; those that wish to receive and process metric data implement the MericsSink interface. Hadoop Operations


  10. This facilitates compiled tests being able to reference JUnit or TestNG annotations or base classes. Building and Testing with Gradle



35、单词 servlet 的匹配文本:


  1. An example of such an exception type is the ServletException. Core Java Volume I Fundamentals 9th Edition


  2. Some of the additional modules that are available from this repository include modules for monitoring an Apache Web Server, MySQL database, and Xen virtual machine, as well as Tomcat and Jetty servlet monitoring through JMX. Monitoring with Ganglia


  3. As the name Servlet may already indicate, it serves a client request and constructs the response. Gradle in Action


  4. Finally, we will add a web project with a Java servlet class that uses classes from the services project. Gradle Effective Implementation Guide


  5. In a servlet or JavaServer Page (JSP) that you write in a SPNEGO-protected web application, you can get the user information passed in the HTTP Kerberos SPNEGO simply by calling getUserPrincipal() and getRemoteUser() on the HttpRequest object in the same way that you would get the principal information if the user used username/password authentication, or even digital certificate authentication. Professional Hadoop Solutions


  6. What’s interesting with the implementation of this servlet is that it redirects the actual file download to one of the DataNodes that contains the first block of the file. Hadoop in Practice


  7. Finally, coprocessors can be chained, very similar to what the Java Servlet API does with request filters. HBase:The Definitive Guide


  8. Similarly, as servlet authors need not implement the low-level details of socket I/O, event handling loops, and complex thread coordination, MapReduce developers program to a well-defined, simple interface and the “container” does the heavy lifting. Hadoop Operations


  9. The first type of application server implements the Java Servlet specification (http://mng.bz/cmMj) and is known as a web container. ActiveMQ in Action


  10. Most socket-based web services (for example, HTTP daemons, servlet engines, and application servers) are multithreaded. Concurrent Programming in Java Design Principles and Pattern



36、单词 jdbc 的匹配文本:


  1. Using the JDBC channel is a bad idea as it would introduce a bottleneck and single point of failure instead of what should be designed as a highly distributed system. Apache Flume Distributed Log Collection for Hadoop


  2. I don’t want to bore you with repetitive details of a class that uses JDBC, so I won’t discuss the code in detail. Gradle in Action


  3. Instead of moving data between clusters, Sqoop was designed to move data from and into relational databases using a JDBC driver to connect. Hadoop RealWorld Solutions Cookbook


  4. Alternatively, you can place the database in a directory outside of your development directory and edit the JDBC URL to point to the correct path, but this step is left as an exercise for the reader. Gradle Beyond The Basics


  5. Discussion MapReduce contains DBInputFormat and DBOutputFormat classes, which can be used to read and write data from databases via JDBC. Hadoop in Practice


  6. Conceptually this is analogous to using JDBC to execute SQL queries. Hadoop in Action


  7. Instead of just abstracting the database interactions from the developer’s point of view, the way JDBC does, an entirely different mechanism is used. Cassandra The Definitive Guide


  8. You can of course use any RDBMS for which a JDBC driver is available, but if you do so, you'll need to modify the aspects of this chapter that require direct interaction with the database server. Hadoop Beginners Guide


  9. For even greater performance, Sqoop supports database-specific plug-ins that use native features of the RDBMS rather than incurring the overhead of JDBC. Hadoop Operations


  10. For example, ActiveMQ offers its own style of ultra-fast message persistence via KahaDB, but also supports standard JDBC-accessible databases. ActiveMQ in Action



37、单词 parser 的匹配文本:


  1. If there is an error message in a log file that indicates that Jetty could not start its web server, there is a nonvalidating XML parser in the classpath ahead of the validating XML parser that Hadoop Core supplies. Pro Hadoop


  2. A typical example of a transitive dependency would be the XML parser library Xerces that requires the XML APIs library to function correctly. Gradle in Action


  3. In the dialog window with task types, we select JUnit Parser from the Tests section. Gradle Effective Implementation Guide


  4. Lucene query language Some of the query types provided by ElasticSearch support Apache Lucene query parser syntax. Mastering ElasticSearch


  5. Since the XML is pretty simple, we usually avoid using a full-blown XML parser and just parse it with some string operations in our Java code. MapReduce Design Patterns


  6. This is a perfect job for a Copy task with a filter attached to it, but that filter has a lot of work to do! It has to be a full-featured Markdown parser and HTML renderer. Gradle Beyond The Basics


  7. The XML element in Text form has been supplied, so you’ll need to use an XML parser to extract content from the XML. Hadoop in Practice


  8. It just calls the parser’s parse() method, which parses the fields of interest from a line of input, checks whether. Hadoop:The Definitive Guide 3rd Edition


  9. Besides having added components like a crawler and a parser, a web search engine differs from a basic document search engine in terms of scale. Hadoop in Action


  10. Hive processes the above SQL query in a series of phases: (1) The parser transforms the query into an Abstract Syntax Tree. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads



38、单词 activemq 的匹配文本:


  1. Right now there are many different pieces of software designed specifically for task queues (ActiveMQ, RabbitMQ, Gearman, Amazon SQS, and others), but there are also ad hoc methods of creating task queues in situations where queues aren’t expected. Redis in Action


  2. We selected ActiveMQ as a potential system for the message queue. Building LinkedIn’s Real-time Activity Data1


  3. Setting up an ActiveMQ server and using JMS with Spring are discussed in the following sections. Pro Spring 3


  4. ActiveMQ will be supported for many more years to come! Next Generation Open Source Messaging with Apollo


  5. Options like ActiveMQ, ZeroMQ, and Apache Qpid all providing different open source approaches to message queuing. RabbitMQ in Action


  6. We’ll need to install the Sun Java Runtime, get the ActiveMQ package, and configure it. Managing Infrastructure with Puppet


  7. Puppet queue uses the ActiveMQ middleware service to handle message passing and queuing. Pro Puppet


  8. Kafka can be compared with Scribe or Flume as it is useful for processing activity stream data; but from the architecture perspective, it is closer to traditional messaging systems such as ActiveMQ or RabitMQ. Apache Kafka


  9. ActiveMQ and RabbitMQ don’t seem to have an easy way to batch messages and we assume that it used a batch size of 1 Kafka a Distributed Messaging System for Log Processing


  10. ActiveMQ was developed to provide an open source alternative: its central theme is to put the power in the hands of developers. ActiveMQ in Action



39、单词 jms 的匹配文本:


  1. Performance test of different server types including web (HTTP and HTTPS), SOAP, database, LDAP, JMS, mail, and native commands or shell scripts. Performance Testing With JMeter 2.9


  2. This specification a JMS provider that implements both point-to-point messaging as well as. Java EE 7 Specification


  3. In addition, several JMS interfaces implement Autocloseable and thus are automatically closed after use. Java EE 7 Essentials


  4. Since it is not a required component of Profile, Web Profile products are not required to include an entation of JMS, nor do they have to support other JMS-related. Java EE 7 Web Profile Specification


  5. At first, it seems like a thoughtful and elegant design to ensure overall durability of the database, and appears unproblematic because it is familiar from many distributed computing paradigms, such as Java Message Service (JMS) Cassandra The Definitive Guide


  6. By far, the most commonly used appenders write log events to disk, but appenders for outputting log events to the console, sending data to syslog, or even to JMS exist. Hadoop Operations


  7. You can do this using an underlying messaging protocol, such as JMS, or independently of the transport protocol using the design of the Web Service. Java Performance Tuning 2nd Edition


  8. Although ActiveMQ implements the JMS specification, which has not moved in many years, ActiveMQ provides many features beyond this spec and it has not stopped innovating. ActiveMQ in Action


  9. However, for asynchronous interactions, say, a Web service or JMS client, the rate at which messages are injected into the system, injection rate, may be the more important criteria. Java Performance


  10. For different approaches, see for example the Aleph, JMS, and Ninja packages, accessible via links from the online supplement. Concurrent Programming in Java Design Principles and Pattern



40、单词 tika 的匹配文本:


  1. Apache Tika, a project in  itself extends capabilities of Apache Solr to run on top of different types of fi les. Scaling Big Data with Hadoop and Solr Chapter-02 Understanding Solr


  2. Chris began to advocate for Tika as a standalone project in 2006 Tika in Action


  3. Tika makes a variety of metadata available about documents; this example just used two. Solr 3 Enterprise Search Server


  4. To do that, we either need to parse the data in some external application or setup Solr to use Apache Tika. Solr 3.1 Cookbook


  5. The fetch function makes use of the Apache Tika library (http://tika.apache.org/) to fetch and extract the contents of the document: Storm Real-Time Processing Cookbook


  6. With Solr this is easy because it integrates with Apache Tika project that supports most popular document formats. Solr in action


  7. It also does not provide any document filters, although Lucene has a sister project at Apache, Tika, which handles document filtering very well. Lucene in Action 2nd Edition


  8. To do that, we either need to parse the data in some external application or set up Solr to use Apache Tika. Apache Solr 4 Cookbook



41、单词 configuring 的匹配文本:


  1. So, let us start with the first step, downloading and configuring Flume. Apache Flume Distributed Log Collection for Hadoop


  2. Configuring and Building CouchDB Unless you have specific requirements, configure will probably work everything out for you and you can simply run: Getting Started with CouchDB


  3. Configuring CouchApp with .couchapprc If you don’t want to have to put the full URL (potentially including authentication parameters) of your database onto the command line each time you push, you can use the .couchapprc file to store deployment settings. CouchDB The Definitive Guide


  4. Now let's look at the possibilities of configuring the behavior of query rewriting. Mastering ElasticSearch


  5. However, there are a few subtle things to keep in mind when configuring them. Cassandra High Performance Cookbook


  6. Configuring CouchDB When you first build or install CouchDB on your system, there is zero configuration required to start creating CouchDB databases. Beginning CouchDB


  7. Instead, we will focus on installing and configuring the Apache HTTP Server as a load balancer. Scaling CouchDB


  8. Additionally, configuring analyzers at node level requires changing the config file on every node and restarting every node which becomes a maintenance nightmare. Elasticsearch The Definitive Guide


  9. Chapter 6, Configuring Cassandra This chapter shows you how to specify partitioners, replica placement strategies, and snitches. Cassandra The Definitive Guide


  10. In part 2, you’ll learn about configuring various aspects of ActiveMQ for connectivity, message persistence, and security. ActiveMQ in Action



42、单词 namespace 的匹配文本:


  1. To add properties, you’re required to use the ext namespace. Gradle in Action


  2. Part of this long iteration phase was due to a slow realization that Spanner should do more than tackle the problem of a globallyreplicated namespace, and should also focus on database features that Bigtable was missing. Spanner - Google’s Globally-Distributed Database


  3. The Ceph file system has three main components: the client, each instance of which exposes a near-POSIX file system interface to a host or process; a cluster of OSDs, which collectively stores all data and metadata; and a metadata server cluster, which manages the namespace (file names and directories) while coordinating security, consistency and coherence (see Figure 1) Ceph A Scalable, High-Performance Distributed File System


  4. The basics: Planet-scale structured storage Next generation of Bigtable stack Provides a single, location-agnostic namespace Manual and access-based data placement. Storage Architecture and Challenges


  5. In a different scenario, you might decide to “namespace” the task names by prefixing them with lb or liquibase to keep them from colliding with tasks from other plug-ins, but for our purposes here we can keep the task names short and simple. Gradle Beyond The Basics


  6. Actually, in Elasticsearch, our data is stored and indexed in shards, while an index is just a logical namespace which groups together one or more shards. Elasticsearch The Definitive Guide


  7. The keyspace is essentially just a logical namespace to hold column families and certain configuration properties. Cassandra The Definitive Guide


  8. The following listing demonstrates how to configure ActiveMQ using its custom Spring schema via an XML namespace. ActiveMQ in Action


  9. Chubby provides a namespace that consists of directories and small files. Bigtable - A Distributed Storage System for Structured Data


  10. Gradle brings in everything from the Ant namespace into the Gradle namespace such that using a core Ant task is as easy as calling ant.

    Building and Testing with Gradle



43、单词 www 的匹配文本:


  1. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub. Apache Flume Distributed Log Collection for Hadoop


  2. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia. Getting Started with CouchDB


  3. Dear reader, refer to http://www.elasticsearchserverbook.com from time to time, where you'll be able to find the newest errata dedicated to the book and additional articles extending it. Mastering ElasticSearch


  4. Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. Cassandra High Performance Cookbook


  5. For information on translations, please e-mail info@apress.com, or visit http://www.apress.com. Beginning CouchDB


  6. For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com. Scaling CouchDB


  7. Typically, you'll download the package from http://www.elasticsearch.org/download/, which is the best fit for your environment. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. Finding Out More If you’d like to find out more about Cassandra, and to get the latest updates, visit this book’s companion website at http://www.cassandraguide.com. Cassandra The Definitive Guide


  9. If you don’t have the Sun J2SE installed and you’re using Linux, Solaris, or Windows, download and install it from the following URL: http://www.oracle.com/technetwork/java/javase/downloads/ index.html. ActiveMQ in Action


  10. Irrelevant details were elided to keep the example short.) The call to Apply performs an atomic mutation to the Webtable: it adds one anchor to www.cnn.com and deletes a different anchor. Bigtable - A Distributed Storage System for Structured Data



44、单词 jrockit 的匹配文本:


  1. I remember quite clearly the first time I met the JRockit team. Oracle JRockit The Definitive Guide


  2. Work with the JRockit Mission Control 3.1/4.0 tools suite to debug or profile your Java applications. Java 7 Concurrency Cookbook



45、单词 linux 的匹配文本:


  1. You'd be hard pressed to find such a company using the bleeding-edge Linux kernel rather than something like Red Hat Enterprise Linux (RHEL), CentOS, Ubuntu LTS, or any of the other distributions that aim for is stability and compatibility. Apache Flume Distributed Log Collection for Hadoop


  2. Install using the native packages for your chosen Linux platform. Getting Started with CouchDB


  3. Debian GNU/Linux You can install the CouchDB package by running: CouchDB The Definitive Guide


  4. In this second case, called EC2, Amazon provides server instances and of course they can be used for installing and running the ElasticSearch cluster (among many other things as those are normal Linux machines) Mastering ElasticSearch


  5. He has experience as a developer as well Linux and network administrator and enjoys the rich world of Open Source software. Cassandra High Performance Cookbook


  6. A user space approach, aside from providing greater flexibility and easier implementation, also avoids cumbersome interaction with the Linux VFS and page cache, both of which were designed for a different interface and workload. Ceph A Scalable, High-Performance Distributed File System


  7. Next, I provide you with a step-by-step guide to installing CouchDB on various distributions of the Linux operating system as well as on a Mac, before guiding you through the creation of your first database. Beginning CouchDB


  8. Managing scalability in object storage systems for HPC Linux clusters. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.


  9. On Linux, GZip extraction utilities should be preinstalled; on Windows, you’ll need to get a program such as WinZip, which is commercial, or something like 7-Zip, which is freeware. Cassandra The Definitive Guide


  10. If you don’t have the Sun J2SE installed and you’re using Linux, Solaris, or Windows, download and install it from the following URL: http://www.oracle.com/technetwork/java/javase/downloads/ index.html. ActiveMQ in Action



46、单词 jpa 的匹配文本:


  1. This technique is helpful for defining Scala classes that work with the Java Persistence API (JPA) Scala in Depth


  2. Hibernate implements the Java Persistence API (JPA), and by using the JPA annotation Entity you’re specifying Hibernate to persist the object to the database. Scala in Action


  3. The support for Hibernate, MyBatis, JDO, and JPA makes already simple APIs even simpler, thus easing the burden on developers. Pro Spring 3


  4. This work can be done by hand.The Java Persistence Architecture (JPA) allows you to annotate a Java object and use JPA to handle the serialization and de serialization automatically. Cassandra High Performance Cookbook


  5. Often these objects are backed by a relational database of some type, and you manage them through object relational mapping tools such as Hibernate, JPA, or JDO. Solr 3 Enterprise Search Server


  6. This causes problems when using JPA annotations in Scala code, for example, as discussed in [JPAScala] Programming Scala


  7. JPA/JPO You can use, for example, DataNucleus to put a JPA/JPO access layer on top of HBase. HBase:The Definitive Guide


  8. The Java Persistence API (JPA) is defined as JSR 338, and the complete specification can be downloaded. Java EE 7 Essentials


  9. Note: The JPA specification is a separate specification referenced in the EJB 3.0 specification.) Session beans usually implement the core business logic of an enterprise application and represent the interactive session. Java Performance


  10. Those of you that are new to JPA can rest assured that you were not forgotten. Pro JPA2



47、单词 rabbitmq 的匹配文本:


  1. Other sinks are available on the Internet that can be used to write data to MongoDB, Cassandra, RabbitMQ, Redis, and just about any other data store you can think of. Apache Flume Distributed Log Collection for Hadoop


  2. RabbitMQ and Gearman21 can be very helpful for some purposes, too. High Performance MySQL 3rd Edition


  3. This is usually implemented with RabbitMQ today, but could be any AMPQ message queue (such as Apache Qpid), or Zero MQ. bk-compute-adminguide-grizzly


  4. The easiest way to create a failure-tolerant message queue is probably a combination of RabbitMQ and Hovercraft. CouchDB The Definitive Guide


  5. This is currently implemented with RabbitMQ today, but theoretically could be any AMPQ message queue supported by the Python ampqlib and carrot libraries. Deploying OpenStack


  6. Usually implemented with RabbitMQ, but could be any AMPQ message queue, such as Apache Qpid or Zero MQ. bk-admin-guide-cloud-latest


  7. An OpenStack highavailability configuration uses existing native Pacemaker RAs (such as those managing MySQL databases or virtual IP addresses), existing third-party RAs (such as for RabbitMQ), and native OpenStack RAs (such as those managing the OpenStack Identity and Image Services) high-availability-guide


  8. ActiveMQ and RabbitMQ don’t seem to have an easy way to batch messages and we assume that it used a batch size of 1 Kafka a Distributed Messaging System for Log Processing


  9. There are lots of different queues you could use: Amazon’s SQS, RabbitMQ, or even a MongoDB capped collection (although make sure it’s on a separate server than the cluster it’s protecting) Scaling MongoDB


  10. A dedicated VM was created for each service (nova-scheduler, rabbitmq, database etc) OpenStack Operations Guide



48、单词 concurrency 的匹配文本:


  1. By reducing latency for a given system (and that is true not only for storage systems), you affect concurrency and throughput capabilities. CouchDB The Definitive Guide


  2. Like MapReduce, Dryad handles the details of communication, partitioning, placement, concurrency and fault tolerance. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  3. This architecture is designed to control resource utilization in a high concurrency environment. Cassandra High Performance Cookbook


  4. In addition to the actual data persistence component, the system needs to have the following characteristics; scalable and robust solutions for load balancing, membership and failure detection, failure recovery, replica synchronization, overload handling, state transfer, concurrency and job scheduling, request marshalling, request routing, system monitoring and alarming, and configuration management. Cassandra A Decentralized Structured Storage System


  5. Add to this the fact that CouchDB is developed in Erlang OTP, a fault-tolerant programming language that offers excellent concurrency features, and you know that your CouchDB database will scale well without a loss of reliability and availability. Beginning CouchDB


  6. Transaction, distribution, and concurrency boundaries are also used in domain-driven design when defining Aggregates. Scaling CouchDB


  7. Pessimistic concurrency control widely used by relational databases, assumes that conflicting changes are likely to happen and so blocks access to a resource in order to prevent conflicts. Elasticsearch The Definitive Guide


  8. Cassandra, however, is different: its concurrency model is based on SEDA, so a single operation may start with one thread, which then hands off the work to another thread, which may hand it off to other threads. Cassandra The Definitive Guide


  9. A note on multithreading in JMS applications The JMS spec specifically defines concurrency for various objects in the JMS API and requires that only a few objects support concurrent access. ActiveMQ in Action


  10. As a result, concurrency control over rows can be implemented very efficiently. Bigtable - A Distributed Storage System for Structured Data



49、单词 frameworks 的匹配文本:


  1. Designs that incorporate single points of failure are generally to be avoided in large application frameworks where it can be safely assumed, given the sheer amount of hardware involved, that some percentage of nodes are always going to be on the fritz. Monitoring with Ganglia


  2. We simulate a single resource manager and two scheduler frameworks, one handling batch jobs and one handling service jobs. Omega - flexible, scalable schedulers for large compute clusters


  3. A common pattern in most languages, frameworks, and databases is using types to distinguish between subsets of your data. CouchDB The Definitive Guide


  4. Explore the versatility of Spring Python by integrating it with frameworks, libraries, and tools. Mastering ElasticSearch


  5. Our books and publications share the experiences of your fellow IT professionals in adapting and customizing today's systems, applications, and frameworks. Cassandra High Performance Cookbook


  6. He introduced the concept of JavaScript frameworks and Ajax to the company’s web division in 2008, developing a PL/SQL and JavaScript framework that is now used in every Core web product. Beginning CouchDB


  7. We plan to instrument nonRPC communication frameworks as they evolve and find a user base. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  8. And if you’re an application developer, you’ll no doubt be familiar with the many object-relational mapping (ORM) frameworks that have sprung up in recent years to help ease the difficulty in mapping application objects to a relational model. Cassandra The Definitive Guide


  9. These powerful but complex frameworks are designed for software engineers implementing complex parallel algorithms. Tenzing - A SQL Implementation On The MapReduce Framework


  10. The first is the simple testing of Java classes with existing test frameworks like JUnit and TestNG. Building and Testing with Gradle



50、单词 subclass 的匹配文本:


  1. If there are multiple requests to create the same vertex in the same superstep, then by default the system just picks one arbitrarily, but users with special needs may specify a better conflict resolution policy by defining an appropriate handler method in their Vertex subclass. Pregel  -  A System for Large-Scale Graph Processing


  2. A TreeMap is a subclass of Map that sorts on key. MapReduce Design Patterns


  3. No methods are required for implementation; however, in order for the class to function at Hive runtime as a UDF, your subclass must override evaluate() Hadoop RealWorld Solutions Cookbook


  4. That Coordinate is translated into a Point, a subclass of Geometry, using the same GeometryFactory class you used earlier. HBase in Action


  5. To use the TwoDArrayWritable type as the value type of a reducer's input, you need to specify the type of the stored values by creating a subclass of  the TwoDArrayWritable type similar to the ArrayWritable type. Hadoop MapReduce Cookbook


  6. The big question is, does Avro play nicely withMapReduce? Avro comes with somemapper and reducer classes that you can subclass to work with Avro. Hadoop in Practice


  7. This is the approach that GenericWritable takes, and you have to subclass it to specify which types to support. Hadoop:The Definitive Guide 3rd Edition


  8. Who wants to worry about how files are divided into splits ? In creating your own InputFormat class you should subclass the FileInputFormat class, which takes care of file splitting. Hadoop in Action


  9. For our own mapper implementations, we will subclass this base class and override the specified method as follows: Hadoop Beginners Guide


  10. Hadoop MapReduce (specifically jobs that use input formats that subclass FileIn putFormat) wind up with a map task for each block of the file(s) processed as part of the job.1 This means the file block size can significantly impact the efficiency of a MapReduce job for better or worse. Hadoop Operations



51、单词 boolean 的匹配文本:


  1. The edge value has type void (edges carry no information), and the messages are boolean. Pregel  -  A System for Large-Scale Graph Processing


  2. The rereduce is a boolean value that is used to indicate when the reduce function is being called by itself (i.e., in rereduce mode) Getting Started with CouchDB


  3. In particular, boolean values can be set using yes, true, and on for a positive value and their opposites, no, false, and off for a negative value. Monitoring with Ganglia


  4. A query can also contain Boolean operators that connect terms to each other forming clauses. Mastering ElasticSearch


  5. Boolean: This is a logical data type having one of two values: true or false. Beginning CouchDB


  6. Within a special construct, called a when statement, one defines a quantifier, a variable, and a boolean condition using the variable. Interpreting the Data Parallel Analysis with Sawzall


  7. We are planning to add simple features supported by commercial search engines like boolean operators. The anatomy of a large-scale hypertextual Web search engine


  8. A key is the name of a field or property, and a value can be a string, a number, a boolean, another object, an array of values, or some other specialized type such as a string representing a date or an object. Elasticsearch The Definitive Guide


  9. These are number, Unicode string, boolean, array, object, and null. Cassandra The Definitive Guide


  10. Methods are provided for working with many primitive Java types for header values including Boolean, byte, short, int, long, float, double, and also the String object type. ActiveMQ in Action



52、单词 permissions 的匹配文本:


  1. Instead, create a CouchDB user (couchdb) and set the ownership and permissions of the CouchDB database and support files to the configured user. Getting Started with CouchDB


  2. Does the Apache error log contain any errors? Look for errors about file permissions, missing PHP modules, and the like. Monitoring with Ganglia


  3. The scheduler can see the entire state of the cell and has complete freedom to lay claim to any available cluster resources provided it has the appropriate permissions and priority – even ones that another scheduler has already acquired. Omega - flexible, scalable schedulers for large compute clusters


  4. The Gradle wrapper also takes care of unpacking the distribution and setting the appropriate permissions to execute the batch file. Gradle in Action


  5. Ensure the cassandra user has ownership of this directory and the directory has appropriate file access permissions: Cassandra High Performance Cookbook


  6. I will now show how to change to the source subdirectory inside this new directory and change the permissions of a few files to make sure they are ready for the build process: Beginning CouchDB


  7. To allow the files in different directories to be served from different Chubby masters, we do not expose operations that can move files from one directory to another, we do not maintain directory modified times, and we avoid path-dependent permission semantics (that is, access to a file is controlled by the permissions on the file itself rather than on directories on the path leading to the file) Chubby - The lock service for loosely-coupled distributed systems


  8. Finally, create a couple of directories for Cassandra to store its data and logs, and give them the proper permissions, as shown here: Cassandra The Definitive Guide


  9. Change the permissions on the file to be accessible only by the owner. Hadoop Beginners Guide


  10. For example, does user X have the necessary permissions to execute program Y on system Z? Such privileges are often referred to as access control lists (ACLs) and determine who or what can access a given resource to perform a given operation. ActiveMQ in Action



53、单词 roo 的匹配文本:


  1. A special thanks goes out to Raymond De Roo, Brian Wohlgemuth, David Calafrancesco, Tera Doty, Jay Rubin, Bill Catlan, Anthony Howe, Mark O’Neal, George Montgomery, George Barber, and the myriad other people who patiently listened to me gripe about things, let me bounce ideas off them to see whether an outsider could understand what I was trying to say, or just managed to bring a smile to my face when I needed it most. High Performance MySQL 3rd Edition


  2. In each new version, more features are being added, such as Groovy scripting language support, Spring Roo support, and SpringSource tcServer (an application server with paid support offered by SpringSource that was built on top of the Tomcat server) support. Pro Spring 3


  3. Ruby on Rails has been a particularly rich source of such ideas—Spring Roo and Grails are essentially Rails clones, bringing Rails-like development to Java and Groovy, respectively. Spring in Practice


  4. Spring Roo and GWT If you are new to using GWT or just want a rapid way to create GWT applications, Spring Roo provides scaffolding support for GWT and simplifies integrating Spring and GWT. Spring Security 3.1


  5. Later, we will build some advanced data-driven tests using different data sources in JUnit and a data-driven test in Ruby using Roo, and Python. Selenium Testing Tools Cookbook


  6. Our pain is your gain, and that includes all of our hard work with code that was written the night before, identifying bugs for the Roo team to fix, and working with the fantastic community of readers we have in Manning’s MEAP program, aligned as well with completing the manuscript around the time of the Roo 1.2.1 release. Spring Roo in Action


  7. The emphasis on practical development is also evident in the chapter on Spring Roo, the rapid application development (RAD) solution from the Spring team. Spring Data Modern Data Access for Enterprise Java



54、单词 apis 的匹配文本:


  1. Megastore emphasizes cost-transparent APIs with runtime costs that match application developers' intuitions. Megastore - Providing Scalable, Highly Available Storage for Interactive Services


  2. First, one of the major components of gmond is the APR library, a cross-platform interface intended to provide a set of APIs to common platform functionality in a common and predictable manner. Monitoring with Ganglia


  3. In this section, we’ll take a look at the function APIs, and talk about how functions in a design document are related within applications. CouchDB The Definitive Guide


  4. In addition to that, it explains the batch APIs available in ElasticSearch and shows how to use filters to optimize your queries. Mastering ElasticSearch


  5. The goal of FlumeJava is to allow a programmer to express his or her data-parallel computation in a clear, modular way, while simultaneously executing it with performance approaching that of the best possible hand-optimized programs written directly against MapReduce APIs. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  6. Many web services and feed providers are making their APIs available with the option to return a response in JSON format. Beginning CouchDB


  7. The resulting system also includes code to collect traces, tools to visualize them, and libraries and APIs (Application Programming Interfaces) to analyze large collections of traces. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  8. In this chapter we will discuss the APIs that we use to create, retrieve, update. Elasticsearch The Definitive Guide


  9. Although they are often not stored directly in their “native” format, developers can work with the documents using APIs as if they were. Cassandra The Definitive Guide


  10. Although ActiveMQ is written in Java, APIs for many languages other than Java are provided, including C/C++, .NET, Perl, PHP, Python, Ruby, and many more. ActiveMQ in Action



55、单词 asynchronous 的匹配文本:


  1. This new asynchronous design leaves the input thread blissfully unaware of any downstream problem. Apache Flume Distributed Log Collection for Hadoop


  2. Use this in cases where you don’t trust user input or where you need to trigger an asynchronous job. CouchDB The Definitive Guide


  3. The merge process is asynchronous and in general, should not interfere with indexing and searching, from a Lucene point of view. Mastering ElasticSearch


  4. With consistency thus relaxed, clients can buffer small writes and submit only large, asynchronous writes to OSDs; the only latency seen by applications will be due to clients which fill their caches waiting for data to flush to disk. Ceph A Scalable, High-Performance Distributed File System


  5. The system can be configured to perform either synchronous or asynchronous writes. Cassandra A Decentralized Structured Storage System


  6. You could then leverage these languages to output the data into JSON, which can be returned to JavaScript as the response to an asynchronous HTTP request. Beginning CouchDB


  7. When computation is deferred or made asynchronous, most Google developers use a common control flow library to construct callbacks and schedule them in a thread pool or other executor. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  8. Ironically, such programmers are usually wrong, especially when they use locks in a distributed system; few consider the effects of independent machine failures on locks in a system with asynchronous communications. Chubby - The lock service for loosely-coupled distributed systems


  9. Framed Transport was added to Thrift in order to support asynchronous servers. Cassandra The Definitive Guide


  10. Apache ActiveMQ is one such solution, providing the ability for applications to communicate in an asynchronous, loosely coupled manner. ActiveMQ in Action



56、单词 mvc 的匹配文本:


  1. This decouples the sources of data, the services, from the visual consumers of the data, facilitating use of the Model- View-Controller (MVC) pattern. Thinking in Java 4th Edition


  2. Figure 3.4 illustrates the MVC architecture pattern in the context of a Java web application. Gradle in Action


  3. Think of Application as a controller of the MVC model that handles all the requests. Scala in Action


  4. In fact, event-based architectures help to enforce the separation of concerns and modularity that MVC advocates. Testable JavaScript


  5. It’s far from complete, but it should feel familiar to anyone who has worked with any of the popular frameworks, such as Spring MVC. Functional Programming Patterns in Scala and Clojure


  6. The use of the Active Record pattern as a model in MVC is a good example of a leaky abstraction. SQL Antipatterns


  7. Rails was so much easier to use than Spring MVC, Hibernate, and the other “streamlined” Java web frameworks. Beginning Scala


  8. It accomplishes this flexibility by using a Model View Controller (MVC) design, which allows advanced features such as form validation against an XML schema document, and in-page data refreshes. Cassandra The Definitive Guide


  9. Swing is based on the Model View Controller (MVC) architectural framework pattern. Java 7 Recipes A Problem-Solution Approach


  10. Finally, Listing 6-25 shows the last part of our MVC architecture, the JSP page to render the results. Pro JPA2



57、单词 google 的匹配文本:


  1. Creating mountains of data has become very easy, thanks to Facebook, Twitter, Amazon, digital cameras and camera phones, YouTube, Google, and just about anything else you can think of connected to the Internet. Apache Flume Distributed Log Collection for Hadoop


  2. HBase is an Apache-licensed open source system modeled after Google’s Bigtable. Monitoring with Ganglia


  3. We used some simple JavaScript and Google Groups to allow painless commenting. CouchDB The Definitive Guide


  4. For example, this is what Google shows us when we enter elasticsaerch instead of elasticsearch: Mastering ElasticSearch


  5. Similarly, the Google File System [7] is optimized for very large files and a workload consisting largely of reads and file appends. Ceph A Scalable, High-Performance Distributed File System


  6. The Google File System (GFS)[9] is another distributed file system built for hosting the state of Google’s internal applications. Cassandra A Decentralized Structured Storage System


  7. Google use an implementation of map/reduce in its web index. Beginning CouchDB


  8. Before Chubby was deployed, most distributed systems at Google used ad hoc methods for primary election (when work could be duplicated without harm), or required operator intervention (when correctness was essential) Chubby - The lock service for loosely-coupled distributed systems


  9. Chapter 12, Integrating Hadoop In this chapter, written by Jeremy Hanna, we put Cassandra in a larger context and see how to integrate it with the popular implementation of Google’s Map/Reduce algorithm, Hadoop. Cassandra The Definitive Guide


  10. Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. Bigtable - A Distributed Storage System for Structured Data



58、单词 ruby 的匹配文本:


  1. By implementing a similar module with Perl, Ruby, or PHP, gmond would instantly gain the ability to be extended by implementing gmond metric modules in those languages as well. Monitoring with Ganglia


  2. In later chapters, we’ll focus on using CouchDB from server-side languages such as Ruby and Python. CouchDB The Definitive Guide


  3. Even though it wasn’t part of the core Maven project, a similar approach was proposed by the project Maven Polyglot that allows you to write your build definition logic, which is the project object model (POM) file, in Groovy, Ruby, Scala, or Clojure. Gradle in Action


  4. For example, generating Ruby binding may require ruby and ruby-devel packages to be installed. Cassandra High Performance Cookbook


  5. You will also see how to work with CouchDB from your Python and Ruby applications. Beginning CouchDB


  6. The results, shown in Table 1, demonstrate Sawzall is significantly faster than Python, Ruby, or Perl, at least for these microbenchmarks. Interpreting the Data Parallel Analysis with Sawzall


  7. If you’re familiar with the methodMissing and propertyMissing facilities of Groovy or Ruby’s method_miss ing, you’ve already got the idea. Gradle Beyond The Basics


  8. But because of the interface used to access Cassandra, you can use Cassandra from a wide variety of languages, including C#, Scala, Python, and Ruby. Cassandra The Definitive Guide


  9. We will also show you a few programs in Ruby, but these are usually only to demonstrate language independence, and you don't need to be a Ruby expert. Hadoop Beginners Guide


  10. Although ActiveMQ is written in Java, APIs for many languages other than Java are provided, including C/C++, .NET, Perl, PHP, Python, Ruby, and many more. ActiveMQ in Action



59、单词 sqoop 的匹配文本:


  1. A typical Hadoop-based Big Data platform includes the Hadoop Distributed File System (HDFS), the parallel computing framework (MapReduce), common utilities, a column-oriented data storage table (HBase), high-level data management systems (Pig and Hive), a Big Data analytics library (Mahout), a distributed coordination system (ZooKeeper), a workflow management module (Oozie), data transfer modules such as Sqoop, data aggregation modules such as Flume, and data serialization modules such as Avro. Hadoop Operations and Cluster Management Cookbook


  2. It’s been four years since, via a post to the Apache JIRA, the first version of Sqoop was released to the world as an addition to Hadoop. Apache Sqoop Cookbook


  3. Sqoop is a connectivity tool for moving data between relational databases and data warehouses and Hadoop. Professional Hadoop Solutions


  4. Sqoop is an Apache project that is part of the broader Hadoop ecosphere. Hadoop RealWorld Solutions Cookbook


  5. Finally, we’ll cover how to use Sqoop for database ingress and egress activities, and we’ll look at how to ingress and egress data in HBase. Hadoop in Practice


  6. Sqoop A tool for efficient bulk transfer of data between structured data stores (such as relational databases) and HDFS. Hadoop:The Definitive Guide 3rd Edition


  7. HBase Hadoop database for random read/write access Hive SQL-like queries and tables on large data sets Pig Dataflow language and compiler Oozie Workflow for interdependent Hadoop jobs Sqoop Integrates databases and data warehouses with Hadoop. HBase:The Definitive Guide


  8. Wouldn't it be much better to use an existing tool for the purpose? There is indeed such a tool that we will use throughout the rest of this chapter, Sqoop. Hadoop Beginners Guide


  9. Apache Sqoop Not only does Hadoop not want to replace your database, it wants to be friends with it. Hadoop Operations



60、单词 innodb 的匹配文本:


  1. The innodb options help in dealing with the small, primary key size restriction of MySQL. Web Crawling and Data Mining with Apache Nutch


  2. We cover SHOW INNODB STATUS in particular detail, because it provides deep insight into the operations of the InnoDB transactional storage engine. High Performance MySQL 3rd Edition


  3. They span the spectrum from simple in-memory key/value stores to complicated SQL-speaking MySQL/InnoDB derivatives. MongoDB The Definitive Guide 2nd Edition


  4. InnoDB - The MySQL engine once supported by Riak, but now deprecated. A Little Riak Book


  5. Transaction logging One compromise between speed and durability can be seen in MySQL’s InnoDB. MongoDB in Action


  6. Chapter 11, Transactions Lists the main issues required to support transactions, and uses InnoDB to illustrate the typical architecture used to provide that support. Understanding MySQL Internals



61、单词 caching 的匹配文本:


  1. Cache-control The cache control HTTP response header provides a suggestion for client caching mechanisms on how to treat the returned information. Getting Started with CouchDB


  2. In a high-availability deployment with proper caching, this means that a given show function will be called only once for any particular document, and the CouchDB server may not even be contacted for subsequent requests. CouchDB The Definitive Guide


  3. Using Cassandra for distributed caching Cassandra has several variables that can be configured for each keyspace and column family that drastically change the profile of how it operates. Cassandra High Performance Cookbook


  4. Ceph could allow consistency to be further relaxed by caching metadata longer, much like earlier versions of NFS, which typically cache for 30 seconds. Ceph A Scalable, High-Performance Distributed File System


  5. In order to make the searches fast Cassandra provides certain hooks for intelligent caching of data. Cassandra A Decentralized Structured Storage System


  6. Cache documents and query results using memcached or another caching system. Scaling CouchDB


  7. More significantly, objects in PAST are immutable, facilitating cryptographic protection and simplifying consistency and caching, but limiting the systems usefulness as a general storage service. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.


  8. Older filters will never be reused so caching is disabled by default. Elasticsearch The Definitive Guide


  9. But there are a number of things, including memory settings, data storage, hardware choices, caching, and buffer sizes, that you can tune to squeeze out even more performance. Cassandra The Definitive Guide


  10. Chapter 5 discusses message persistence in ActiveMQ; how messages are stored for queues and topics, the four styles of message stores available, and message caching. ActiveMQ in Action



62、单词 scheduler 的匹配文本:


  1. Extending gmond with gmetric The gmetric utility, which is distributed with the monitoring core, as well as being available in multiple native variants for various programming languages, allows us to submit metrics values without having to rely on either the scheduler present in gmond, or the constraints of the native shared library or Python modules on which it depends. Monitoring with Ganglia


  2. Abstract Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. Omega - flexible, scalable schedulers for large compute clusters


  3. You could use this feature to synchronize database servers within a cluster or between data centers using a job scheduler such as cron, or you could use it to synchronize data with your laptop for offline work as you travel. CouchDB The Definitive Guide


  4. Each job consists of a set of tasks, and is mapped by the scheduler to a set of available machines within a cluster. MapReduce - Simplied Data Processing on Large Clusters


  5. The concurrent merge scheduler This is a merge scheduler that will use multiple threads in order to perform segments merging. Mastering ElasticSearch


  6. Somewhat akin to FlumeJava’s executor, the Cascading evaluator breaks the dataflow graph into pieces, and, if possible, runs those in parallel, using the underlying Hadoop job scheduler. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  7. Normally, users will have access to an enterprise-wide task scheduler or configuration management system. Cassandra High Performance Cookbook


  8. This provides our low-level disk scheduler with longer I/O queues and a corresponding increase in scheduling efficiency. Ceph A Scalable, High-Performance Distributed File System


  9. We also restrict the Dapper daemon to the lowest possible priority in the kernel scheduler in case CPU contention arises within a heavily-loaded host machine. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  10. Cron is a well-known job scheduler on Unix systems, and it uses an expression string to denote when a job should be scheduled. ActiveMQ in Action



63、单词 initialization 的匹配文本:


  1. Multiple parameters can be passed to the module’s initialization function by including one or more param sections. Monitoring with Ganglia


  2. Depending on how recently the partition was compacted, this should result in efficient, low-CPU initialization of a new node. CouchDB The Definitive Guide


  3. GRADLE’S BUILD LIFECYCLE PHASES Whenever you execute a Gradle build, three distinct lifecycle phases are run: initialization, configuration, and execution. Gradle in Action


  4. A recent problem we experienced was a bug in machine initialization code that caused processor caches to be disabled: computations on affected machines slowed down by over a factor of one hundred. MapReduce - Simplied Data Processing on Large Clusters


  5. Then we have the constructor, which is only used for super class initialization; so we can skip discussing it. Mastering ElasticSearch


  6. If, for some reason, the server is not running and you get an error message when you try the previous line, it may be an issue with the initialization script. Beginning CouchDB


  7. The next declaration has a more complicated initialization expression that uses some built-in functions to extract the cardinal number of the minute of the week from the time value. Interpreting the Data Parallel Analysis with Sawzall


  8. During the initialization phase, Gradle starts up and locates the build files it must process. Gradle Beyond The Basics


  9. Plugins are loaded during initialization, and no plugins are included by default. Elasticsearch in Action MEAP V07 cHAPTER 1


  10. Every time Gradle executes a build, it runs through three lifecycle phases: initialization, configuration, and execution. Building and Testing with Gradle



64、单词 config 的匹配文本:


  1. You aren’t averse to the idea of maintaining a config file on all of your hosts. Monitoring with Ganglia


  2. The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON. CouchDB The Definitive Guide


  3. Many of these Gradle plugins require a configuration file located in a directory config/

    that defines the rules for the analysis. Gradle in Action


  4. Suppose we want to work with a directory named config, in our build script. Gradle Effective Implementation Guide


  5. If CouchDB is installed on your local computer, you can open the Futon configuration utility by pointing your web browser to the address http://127.0.0.1:5984/_utils/config.html. Beginning CouchDB


  6. You’ll connect to the Flume Master UI, and select the config menu from the top, as highlighted in figure 2.7 Hadoop in Practice


  7. Additionally, configuring analyzers at node level requires changing the config file on every node and restarting every node which becomes a maintenance nightmare. Elasticsearch The Definitive Guide


  8. But this YAML file is intended only for users upgrading their config files from XML to YAML. Cassandra The Definitive Guide


  9. An embedded broker using an external configuration file can be achieved using the brokerConfig transport option and by specifying the URI for the activemq.xml file. ActiveMQ in Action


  10. The nested configurations via config and fileset directly correspond to the Checkstyle Ant equivalents as if this was being configured with Ant’s XML syntax. Building and Testing with Gradle



65、单词 classpath 的匹配文本:


  1. Note that if the

    directory is specified, then it is always included first in the classpath. Apache Flume Distributed Log Collection for Hadoop


  2. You tell the compiler about it by referencing the library’s JAR file in the classpath. Gradle in Action


  3. Next, we must add the output directory with the compiled interface class file, to the compileClasspath property of the main source set. Gradle Effective Implementation Guide


  4. Also, ensure that the test.classpath is included in the test target. Cassandra High Performance Cookbook


  5. The Java compiler has a command-line switch for specifying the classpath, and that switch must be provided with an operatingspecific string. Gradle Beyond The Basics


  6. For example, the following statements in pig-env.sh will tell Pig the version of Hadoop used by the cluster is Pig’s classpath: Hadoop in Action


  7. This probably means that you started Cassandra directly from within the bin directory, and it therefore sets up its Java classpath incorrectly and can’t find the CliMain file to start the client. Cassandra The Definitive Guide


  8. Time for action – setting up the classpath To compile any Hadoop-related code, we will need to refer to the standard Hadoop-bundled classes. Hadoop Beginners Guide


  9. In order to run the clients using the HTTP transport protocol, one dependency must be added to the classpath. ActiveMQ in Action


  10. This directory is automatically compiled and added to the build classpath. Building and Testing with Gradle



66、单词 superclass 的匹配文本:


  1. The existing class is called the superclass, base class, or parent class. Core Java Volume I Fundamentals 9th Edition


  2. Many of the methods used here come from the superclass AbstractCopyTask, as shown in figure 4.9 Gradle in Action


  3. Any class that implements an interface must provide an implementation for each method defined by the interface or inherit the implementation from a superclass. Java 8 in Action Lambdas, Streams and Functional-style Programming


  4. This implementation first leverages the superclass (FileInputFormat) to get splits with locations calculated to ensure data locality. Professional Hadoop Solutions


  5. The basic idea in Java is that we can implement a generic class by using an appropriate superclass, such as Object. Data Structures and Algorithm Analysis in Java 3rd Edition


  6. Similarly, all classes inheriting from a superclass inherit the validation behavior as well. Java EE 7 Essentials


  7. The fingerprint is obtained by ordering the descriptions of the class, superclass, interfaces, field types, and method signatures in a canonical way, and then applying the so-called Secure Hash Algorithm (SHA) to that data. Core Java Volume II Advanced Features 9th Edition


  8. All constructors must invoke a superclass constructor, explicitly or implicitly, and a subclass would have no accessible superclass constructor to invoke. Effective Java 2nd Edition


  9. So the synchronized modifier is not automatically inherited when subclasses override superclass methods, and methods in interfaces cannot be declared as synchronized. Concurrent Programming in Java Design Principles and Pattern


  10. All arrays, regardless of the type of data they hold, have Object as their superclass. Java 7 Recipes A Problem-Solution Approach



67、单词 plugins 的匹配文本:


  1. The community is quite strong and there are many plugins for other applications available. Apache Flume Distributed Log Collection for Hadoop


  2. Operators denote criticality The operators specified in the Nagios definitions for the Ganglia plugins always indicate the “critical” state. Monitoring with Ganglia


  3. He has been a long term Gradle contributor and is the author of several popular Gradle plugins. Gradle in Action


  4. We can write tasks, apply plugins, and write build logic using the Gradle build language. Gradle Effective Implementation Guide


  5. Chapter 9, Developing ElasticSearch plugins, covers ElasticSearch plugins development by showing and deeply describing how to write your own river and language plugin. Mastering ElasticSearch


  6. Plugins are loaded during initialization, and no plugins are included by default. Elasticsearch in Action MEAP V07 cHAPTER 1


  7. Facebook Insights provides developers and website owners with access to real-time analytics related to Facebook activity across websites with social plugins, Facebook Pages, and Facebook Ads. Apache Hadoop Goes Realtime at Facebook


  8. Flume interceptors are plugins that can manipulate and modify events before they pass from the source to the channel. Hadoop Beginners Guide


  9. The

    element provides the plugins element for declaring plug-ins. ActiveMQ in Action


  10. Gradle’s built-in tasks and plugins are enough for many users to script their builds without any custom code, but not always. Building and Testing with Gradle



68、单词 mahout 的匹配文本:


  1. Mahout in Action aims to be accessible, to unearth the interesting nuggets of insight for the enthusiast, and to save the practitioner time in getting work done. Mahout in Action


  2. Apache Lucene along with Mahout and Solr were developed to address these needs. Scaling Big Data with Hadoop and Solr Chapter-02 Understanding Solr


  3. The Taste project (a recommendation engine) has become a part of Mahout and is included in the 0.1 release. Pro Hadoop


  4. A typical Hadoop-based Big Data platform includes the Hadoop Distributed File System (HDFS), the parallel computing framework (MapReduce), common utilities, a column-oriented data storage table (HBase), high-level data management systems (Pig and Hive), a Big Data analytics library (Mahout), a distributed coordination system (ZooKeeper), a workflow management module (Oozie), data transfer modules such as Sqoop, data aggregation modules such as Flume, and data serialization modules such as Avro. Hadoop Operations and Cluster Management Cookbook


  5. Mahout — This is a machine-learning and data-mining library that provides MapReduce implementations for popular algorithms used for clustering, regression testing, and statistical modeling. Professional Hadoop Solutions


  6. This book covers unloading/loading to and from HDFS, graph analytics with Giraph, batch data analysis using Hive, Pig, and MapReduce, machine-learning approaches with Mahout, debugging and troubleshooting MapReduce jobs, and columnar storage and retrieval of structured data using Apache Accumulo. Hadoop RealWorld Solutions Cookbook


  7. Chapter 9, Mass Text Data Processing, explains how to use Hadoop and Mahout to process large text datasets, and how to perform data preprocessing and loading operations using Hadoop. Hadoop MapReduce Cookbook


  8. And more thanks go to Josh Patterson, who reviewed my Mahout chapter. Hadoop in Practice


  9. Apache Mahout (http://mahout.apache.org/) is a project to build machine-learning libraries (such as classification and clustering algorithms) that run on Hadoop. Hadoop:The Definitive Guide 3rd Edition


  10. The Cloudera distribution is available at http://www.cloudera.com/hadoop and contains a large number of Apache products, from Hadoop itself, Hive, Pig, and HBase through tools such as Sqoop and Flume, to other lesser-known products such as Mahout and Whir. Hadoop Beginners Guide



69、单词 enum 的匹配文本:


  1. The ordinal method yields the position of an enumerated constant in the enum declaration, counting. Core Java Volume I Fundamentals 9th Edition


  2. Each action command is mapped to an enum called CommandLineInput. Gradle in Action


  3. The other values of AnalyzerScope enum are GLOBAL and INDEX. Mastering ElasticSearch


  4. Enum has a name, an optional namespace, document, and alias, and contains a list of symbols (valid JSON strings) Professional Hadoop Solutions


  5. In the CounterExample job, an Enum for tracking the count of each type of bad record was defined: Hadoop RealWorld Solutions Cookbook


  6. This method returns a ReturnCode, which is an enum defined as a part of the Filter interface. HBase in Action


  7. The set of counters in an enum will form a group of counters. Hadoop MapReduce Cookbook


  8. The counter enum you’ll use to set the number of hops when you’ve reached the target node. Hadoop in Practice


  9. ObjectWritable is a general-purpose wrapper for the following: Java primitives, String, enum, Writable, null, or arrays of any of these types. Hadoop:The Definitive Guide 3rd Edition


  10. The second form uses a Java enum to specify counter   names, which forces you to have them defined at compile time, but it also allows for type checking. Hadoop in Action



70、单词 proofreaders 的匹配文本:


  1. Special thanks go to Ivan Kirkpatrick and Clint Combs, my technical proofreaders, for their in-depth feedback on the text and the code. Scala in Action


  2. We hope your other projects aren’t as exciting as ours! Thanks also to our technical editor Mark Henry Ryan and our technical proofreaders Jerry Kuch and Kristine Kuch. HBase in Action


  3. These will be cleaned up during production of the book by copyeditors and proofreaders. Elasticsearch in Action MEAP V07 cHAPTER 1


  4. To our technical proofreaders, Neale Upstone and Doug Warren: your careful attention to the configuration and code has been greatly appreciated. Spring Integration in Action


  5. We wish to thank our production team of Mary Piergies; maestro Troy Mott and his band of merry editors: Ben Berg, Tara McGoldrick, and Bob Herbstman; our talented proofreaders: Katie Tennant and Alyson Brener; and others behind the scenes whom we are not able to name. Spring Roo in Action



71、单词 copyeditors 的匹配文本:


  1. These will be cleaned up during production of the book by copyeditors and proofreaders. Elasticsearch in Action MEAP V07 cHAPTER 1



72、单词 iterator 的匹配文本:


  1. It can inspect and modify the values of out-edges using methods supplied by the out-edge iterator. Pregel  -  A System for Large-Scale Graph Processing


  2. The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON. CouchDB The Definitive Guide


  3. The intermediate values are supplied to the user’s reduce function via an iterator. MapReduce - Simplied Data Processing on Large Clusters


  4. Each Reducer invocation is passed a key and an iterator over all the values associated with that key, and emits zero. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  5. It is worth noting that by making the values available as an iterator, Hadoop gives the programmer a chance to process the data without storing them in memory. Hadoop MapReduce Cookbook


  6. The MapReduce framework reuses iterator value objects supplied to combine. Hadoop in Practice


  7. In addition, the reduce() method no longer accepts the list of values as an Iterator but as an Iterable, which is easier to iterate through using Java’s foreach syntax. Hadoop in Action


  8. The Reducer interface uses an Iterator object instead of an Iterable object; this was changed as the latter works with the Java for each syntax and makes for cleaner code. Hadoop Beginners Guide


  9. Percolator provides a low-level Get/Set/iterator API rather than a high-level SQL interface, so we created indexes and did all the ‘query planning’ by hand. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications


  10. Sometime you just need an iterator, and other times you’d like to express a moderately complex conditional without resorting to major build file surgery. Building and Testing with Gradle



73、单词 username 的匹配文本:


  1. This will prompt you for a username and password that will be given administration rights. Getting Started with CouchDB


  2. When a user successfully authenticates, a hash is generated from the username and a secret key and is stored in a cookie and made available to the rest of gweb. Monitoring with Ganglia


  3. You can even do things like pass the username into the list using a query parameter (but it’s not recommended, as you’ll ruin cache efficiency) CouchDB The Definitive Guide


  4. It requires you to fill out your email address, name, and password, as well as a username and domain. Gradle in Action


  5. It is not a good idea to store the username and password as plain text in the build file; this is because anyone can read our password, if stored in plain text. Gradle Effective Implementation Guide


  6. Each of your testing clients should use the same username for running tests. Scaling CouchDB


  7. We assign values to the url, username, pass word, changeLog, and command properties through a very standard assignment syntax. Gradle Beyond The Basics


  8. For example, if you have a User column family and you want to find users in a particular city, you might create a second column family called UserCity that stores user data with the city as keys (instead of the username) and that has columns named for the users who live in that city. Cassandra The Definitive Guide


  9. Obviously, for authentication purposes, each user must have a username and a password. ActiveMQ in Action


  10. Because most of the properties have defaults (some of which, like username and password, are obviously specific to the build), each invocation of the task has very little to configure. Building and Testing with Gradle



74、单词 jdk 的匹配文本:


  1. As a prerequisite, make sure you’ve already installed the JDK with a version of 1.5 or higher. Gradle in Action


  2. Suppose we have both OpenJDK and Oracle Java installed in /usr/openjdk and / usr/jdk respectively. Hadoop Operations and Cluster Management Cookbook


  3. What you need for this book In order to work with Gradle and the code samples in the book, we need at least a Java Development Kit (JDK 1.5 or higher), Gradle, and a good text editor. Gradle Effective Implementation Guide


  4. In order to check what Java processes are running and what their virtual machines identifiers are, one can just run a jps command which is provided with most JDK distributions. Mastering ElasticSearch


  5. JAVA_HOME must be set to point to a JDK, not to a JRE. Hadoop MapReduce Cookbook


  6. If a current JDK isn’t in the system path you’ll want to come here to configure your JAVA_HOME. Hadoop in Practice


  7. You can download the latest JDK for other operating systems from Sun at http://java.sun.com/javase/downloads/index.jsp. Hadoop in Action


  8. Ivy requires Ant, and building from source requires the complete JDK, version 1.6.0_20 or better, not just the JRE. Cassandra The Definitive Guide


  9. Time for action – checking the prerequisites Hadoop is written in Java, so you will need a recent Java Development Kit (JDK) installed on the Ubuntu host. Hadoop Beginners Guide


  10. Here’s an example of this using the keytool that comes with the JDK: ActiveMQ in Action



75、单词 timestamp 的匹配文本:


  1. The headers are key/value pairs that can be used to make routing decisions or carry other structured information (such as the timestamp of the event or hostname of the server where the event originated) Apache Flume Distributed Log Collection for Hadoop


  2. Allowed options are now (uses current system time), UNIX timestamp, or any other well-formed date, as supported by PHP’s strtotime function. Monitoring with Ganglia


  3. Instead of a string, we are going to use an array, where individual members are part of a timestamp in decreasing significance. CouchDB The Definitive Guide


  4. This method allows you to set your own value for this timestamp. Mastering ElasticSearch


  5. The CLI uses microseconds since epoch time as the value of timestamp when setting columns. Cassandra High Performance Cookbook


  6. When users were interested in individual machines, they were also interested in a specific service, so we eventually combined the two into a composite index which allows for efficient lookup by service name, host machine, and timestamp, in that order. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  7. Instead, break a big reindex down into smaller jobs by filtering on a date or timestamp field: Elasticsearch The Definitive Guide


  8. Putting this all together, we have the basic Cassandra data structures: the column, which is a name/value pair (and a client-supplied timestamp of when it was last updated), and a column family, which is a container for rows that have similar, but not identical, column sets. Cassandra The Definitive Guide


  9. A few properties for the timestamp plug-in can be configured, as shown in table 11.5 ActiveMQ in Action


  10. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Bigtable - A Distributed Storage System for Structured Data



76、单词 tcp 的匹配文本:


  1. Syslog has an older UDP transport as well as a newer TCP protocol that can handle data larger than a single UDP packet can transmit (about 64k), as well as deal with network-related congestion events that might require the data to be retransmitted. Apache Flume Distributed Log Collection for Hadoop


  2. It includes interactive query functionality and may be polled by external monitoring systems via a simple text protocol on TCP port 8652 Monitoring with Ganglia


  3. This is curl telling us that it is going to establish a TCP connection to the CouchDB server we specified in our request URI. CouchDB The Definitive Guide


  4. This recipe shows how to interpret what the following TCP connection graphs are showing. Cassandra High Performance Cookbook


  5. In all tests, clients, OSDs, and MDSs are user processes running on a dual-processor Linux cluster with SCSI disks and communicating using TCP. Ceph A Scalable, High-Performance Distributed File System


  6. All system control messages rely on UDP based messaging while the application related messages for replication and request routing relies on TCP. Cassandra A Decentralized Structured Storage System


  7. The port attribute indicates the TCP/IP port number to use. Scaling CouchDB


  8. A failure on the TCP socket results in a limited number of reconnect attempts before a failure is reported to the monitor cluster. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.


  9. It is accessible over a TCP/IP interface and is available from a variety of languages, including Java and web services. Cassandra The Definitive Guide


  10. It discusses ActiveMQ URIs and all the transport connectors for both client-to-broker and broker-to-broker communications including TCP, NIO, STOMP, failover, SSL, HTTP/S, and much more. ActiveMQ in Action



77、单词 tuple 的匹配文本:


  1. In the Pregel implementation of this algorithm the vertex value is a tuple of two values: a flag indicating which set the vertex is in (L or R), and the name of its matched vertex once known. Pregel  -  A System for Large-Scale Graph Processing


  2. In addition, the values flowing through a Cascading pipeline are special untyped “tuple” values, and Cascading operations focus on transforms over tuples; in contrast, a FlumeJava pipeline computes over arbitrary Java objects using arbitrary Java computations. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  3. A column is stored as a tuple of name, value, and timestamp. Cassandra High Performance Cookbook


  4. The IsUseragentBot class extends the abstract class FilterFunc, which allows us to override the exec(Tuple t) method. Hadoop RealWorld Solutions Cookbook


  5. A compiletime directive in Sawzall, proto, somewhat analogous to C’s #include directive, imports the DDL for a protocol buffer from a file and defines the Sawzall tuple type that describes the layout. Interpreting the Data Parallel Analysis with Sawzall


  6. This selfdescribing coordinate tuple was originally introduced to the Java ecosystem by the Maven repository format. Gradle Beyond The Basics


  7. It works in a way similar to this technique and captures all of the content between a start and end tag, supplying it as a single byte array field in a Pig tuple. Hadoop in Practice


  8. A tuple is just like a row of data in a database table, with multiple fields in a particular order. Hadoop:The Definitive Guide 3rd Edition


  9. A field in a tuple or a value in a map can be null or any atomic or complex type. Hadoop in Action


  10. The MapReduce jobs consist of DAGs of relational operators (such as filter, select (project), join, aggregation) that operate as iterators: each operator forwards a data tuple to the next operator after processing it. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads



78、单词 screenshot 的匹配文本:


  1. The screenshot in Figure 10-10 shows the word cloud generated from a MapReduce view of CouchDB’s archived tweets. CouchDB The Definitive Guide


  2. Figure 3.2 shows a screenshot of the user interface rendered in a browser. Gradle in Action


  3. We have only one task, which is listed in the task tree, as shown in the following screenshot: Gradle Effective Implementation Guide


  4. After that Paramedic shows us two primary shards, as given in the following screenshot: Mastering ElasticSearch


  5. The following screenshot shows the current status of the HDFS installation including the number of nodes, total storage, storage taken by each node. Hadoop MapReduce Cookbook


  6. User annotations are not shown in this screenshot, but they may be selectively included in the global time line on a spanby-span basis. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  7. Figure 13.12 Screenshot of the counter in JobTracker’s job summary page. Hadoop in Practice


  8. Figure 8-2 shows a screenshot of the keyspace and column family configuration information that the web. Cassandra The Definitive Guide


  9. As shown in the preceding screenshot, click on the Create bucket button and enter a name for the new bucket. Hadoop Beginners Guide


  10. Though the screenshot in figure 8.1 shows the project structure for the jms-webapp-local application, the directory structure for the other instances of the application is only slightly different. ActiveMQ in Action



79、单词 scalability 的匹配文本:


  1. CouchDB doesn’t attempt to solve your scalability problems with any single solution, but instead provides you with a simple and flexible system that can be molded and adapted to your needs. Getting Started with CouchDB


  2. We’ll show you how CouchDB’s schema-free document model is a better fit for common applications, how the built-in query engine is a powerful way to use and process your data, and how CouchDB’s design lends itself to modularization and scalability. CouchDB The Definitive Guide


  3. The experience has made him a learned individual in technologies like optimization, high-availability, and scalability. Cassandra High Performance Cookbook


  4. Abstract We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph A Scalable, High-Performance Distributed File System


  5. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Cassandra A Decentralized Structured Storage System


  6. You should test the scalability of your system before it is deployed. Scaling CouchDB


  7. Our implementation offers excellent performance, reliability, and scalability while providing clients with the illusion of a single logical object store. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.


  8. We will also discuss how best to model your data to take advantage of the horizontal scalability of Elasticsearch, and how to configure. Elasticsearch The Definitive Guide


  9. We encounter scalability problems when our relational applications become successful and usage goes up. Cassandra The Definitive Guide


  10. Part 4 discusses advanced features in ActiveMQ such as high availability, scalability, many advanced broker and client features, performance tuning, and administration of ActiveMQ. ActiveMQ in Action



80、单词 constructors 的匹配文本:


  1. The memory management is performed explicitly by constructors, assignment operators, and destructors. Core Java Volume I Fundamentals 9th Edition


  2. Constructors All code that creates and launches a MapReduce job into a Hadoop cluster creates a JobConf object. Pro Hadoop


  3. In contrast, restructuring is usually required in XQuery and object-oriented query languages, e.g., using nested for-loops and constructors. Dremel - Interactive Analysis of WebScale Datasets


  4. Making an HBaseAdmin instance explicitly requires a Configuration instance, a detail hidden from you by the default HTable and HTablePool constructors. HBase in Action


  5. A node contains the data and links to the previous and next nodes, along with appropriate constructors. Data Structures and Algorithm Analysis in Java 3rd Edition


  6. The first part of the implementation is straightforward: there are two Text instance variables, first and second, and associated constructors, getters, and setters. Hadoop:The Definitive Guide 3rd Edition


  7. It expects one or a list of Put objects that, in turn, are created with one of these constructors: HBase:The Definitive Guide


  8. If you want buffering and the data input methods for a file, you need to use the following rather monstrous sequence of constructors: Core Java Volume II Advanced Features 9th Edition


  9. It consists of nothing but C# properties for each of those three items, plus constructors. Cassandra The Definitive Guide


  10. Also, constructors cannot be qualified as synchronized (although block synchronization can be used within constructors) Concurrent Programming in Java Design Principles and Pattern



81、单词 dataset 的匹配文本:


  1. The vertex-centric approach is reminiscent of MapReduce in that users focus on a local action, processing each item independently, and the system composes these actions to lift computation to a large dataset. Pregel  -  A System for Large-Scale Graph Processing


  2. She runs an MR job that cranks through the input data and produces a dataset containing the new signals, stored in billions of records in the distributed file system. Dremel - Interactive Analysis of WebScale Datasets


  3. Routing Maybe routing will be handy in our case? The obvious gain from using routing is the possibility to create effective queries that return only books from our base dataset or data that belongs to our business collaborate (because routing allows us to hit only a part of our index) Mastering ElasticSearch


  4. The uncompressed dataset is tens of terabytes, which is small compared to many NoSQL instances, but was large enough to cause difficulties with sharded MySQL. Spanner - Google’s Globally-Distributed Database


  5. Elasticsearch is designed to work well with a small dataset and to scale to big. Elasticsearch The Definitive Guide


  6. Recently written data should be available quickly, but the entire dataset will also be periodically scanned in order to perform timebased rollups. Apache Hadoop Goes Realtime at Facebook


  7. Doing queries across the entire dataset was tedious and could take hours to run. Cassandra The Definitive Guide


  8. Limited support (no joins) for UPDATE and DELETE is implemented by applying the update or delete criteria on the data to create a new dataset. Tenzing - A SQL Implementation On The MapReduce Framework


  9. These enable a highly interactive Web UI where it is common that a single mouse click leads to processing a trillion values in the underlying dataset. PowerDrill - Processing a Trillion Cells per Mouse Click


  10. Unlike Percolator, ElasTraS offers limited transactional semantics (Sinfonia-like mini-transactions) when dynamically partitioning the dataset and has no support for structuring computation. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications



82、单词 daemon 的匹配文本:


  1. All of these run inside a daemon called an agent. Apache Flume Distributed Log Collection for Hadoop


  2. To collect the dendrite updates, you deployed a single instance of a daemon process, called an axon, Monitoring with Ganglia


  3. This comes as a daemon for Twisted, a popular and high-performance event-driven network programming framework for Python. CouchDB The Definitive Guide


  4. The previous code uses the EsExecutors class, which is an ElasticSearch way of running daemon threads inside the ElasticSearch nodes. Mastering ElasticSearch


  5. TrueTime is implemented by a set of time master machines per datacenter and a timeslave daemon per machine. Spanner - Google’s Globally-Distributed Database


  6. After startup, Cassandra detaches from the console and runs as a daemon. Cassandra High Performance Cookbook


  7. Sending data and running queries by sending JSON objects over HTTP makes it easy for someone to extend anything, from a syslog daemon like rsyslog to a connecting framework like ManifoldCF, to interact with Elasticsearch. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. Dapper penetration can be assessed in two dimensions: the fraction of production processes that can generate Dapper traces (i.e., those that are linked with Dapperinstrumented runtime libraries) and the fraction of production machines running Dapper’s trace collection daemon. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  9. Add -d if you want to run it in the background as a daemon. Elasticsearch The Definitive Guide


  10. JConsole showing the peak thread count for a Cassandra daemon. Cassandra The Definitive Guide



83、单词 topology 的匹配文本:


  1. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. Pregel  -  A System for Large-Scale Graph Processing


  2. This topology is adequate for most environments, but in some cases it is desirable to specify a few specific listeners rather than allowing every node to receive (and thereby waste CPU cycles to process) metrics from every other node. Monitoring with Ganglia


  3. It has the advantage that it’s simple, which helps in understanding topology and diagnosing failures. CouchDB The Definitive Guide


  4. Figure 11 shows the execution times for each query as a function of the server topology. Dremel - Interactive Analysis of WebScale Datasets


  5. Our network topology is simple enough that “distances” can be accurately estimated from IP addresses. GFS - The Google File System


  6. This node is responsible for managing the cluster state and process of assigning shards to nodes in reaction of changes in cluster topology. Mastering ElasticSearch


  7. Instead, it uses an internal process called Gossip to communicate changes about the topology of the Ring between nodes. Cassandra High Performance Cookbook


  8. Having covered each of the Hadoop daemons, we depict the topology of one typical Hadoop cluster in figure 2.3 Hadoop in Action


  9. The new node will not immediately accept requests so that it has time to learn the topology of the ring and accept data that it may also be responsible for. Cassandra The Definitive Guide


  10. Figure 4.5 An example of a complex network of brokers topology. ActiveMQ in Action



84、单词 partitioning 的匹配文本:


  1. You could use multi-master, master/slave, partitioning, sharding, write-through caches, and all sorts of other complex techniques. CouchDB The Definitive Guide


  2. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. MapReduce - Simplied Data Processing on Large Clusters


  3. Like MapReduce, Dryad handles the details of communication, partitioning, placement, concurrency and fault tolerance. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  4. Ceph utilizes a novel metadata cluster architecture based on Dynamic Subtree Partitioning [30] that adaptively and intelligently distributes responsibility for managing the file system directory hierarchy among tens or even hundreds of MDSs. Ceph A Scalable, High-Performance Distributed File System


  5. Describing the details of each of the solutions is beyond the scope of this paper, so we will focus on the core distributed systems techniques used in Cassandra: partitioning, replication, membership, failure handling and scaling. Cassandra A Decentralized Structured Storage System


  6. An alternative to automatic partitioning is to manually partition your documents into different databases by type of document. Scaling CouchDB


  7. Partitioning your documents into different containers or shards, which can be stored on a single node or on multiple nodes. Elasticsearch The Definitive Guide


  8. Partitioning is intended to enable large Chubby cells with little communication between the partitions. Chubby - The lock service for loosely-coupled distributed systems


  9. And even if you can write such a huge check, you still need to carefully choose partitioning keys to the point where you can never entirely ignore the limitation. Cassandra The Definitive Guide


  10. Finally we’ll examine traffic partitioning, which will balance scaling and performance, but will add more complexity to your ActiveMQ application. ActiveMQ in Action



85、单词 urls 的匹配文本:


  1. The structure for the URLs has been standardized, and you should be able to both look at a URL that you are using to understand what it does, and to construct one to access the information that you want. Getting Started with CouchDB


  2. As of Ganglia 3.3, hostnames are converted to lowercase, RRD files are created with lowercase filenames, and lowercase hostnames should be used in URLs to access the metrics. Monitoring with Ganglia


  3. We’ll also briefly describe capabilities available on CouchDB’s roadmap, like replication filters, update handlers for parsing non-JSON input, and a rewrite handler for making application URLs more palatable. CouchDB The Definitive Guide


  4. A document can have multiple Names, which are different URLs by which the document can be referenced. Dremel - Interactive Analysis of WebScale Datasets


  5. Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "What we would like to do is, use the BM25 similarity model for the name field and the contents field." Mastering ElasticSearch


  6. This means you can manage the database by simply visiting URLs in your web browser, using command-line tools such as curl or, more importantly, via any programming language that supports HTTP requests. Beginning CouchDB


  7. The following ProxyPassReverse directives instructs Apache to adjust the URLs in the HTTP response headers to match that of the proxy server, instead of the reverse proxied server. Scaling CouchDB


  8. Indicates new terms, URLs, email addresses, filenames, and file extensions. Cassandra The Definitive Guide


  9. README.txt—A file containing some URLs to documentation to get new users started with ActiveMQ. ActiveMQ in Action


  10. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving) Bigtable - A Distributed Storage System for Structured Data



86、单词 jmx 的匹配文本:


  1. Several Flume components report metrics to the JMX platform MBean server. Apache Flume Distributed Log Collection for Hadoop


  2. Java Virtual Machine Metrics The sFlow Java Virtual Machine (JVM) metrics (Table 8-3) are based on the metrics exposed through the Java Management Extensions (JMX) interface, ensuring consistency with existing JMX-based monitoring systems. Monitoring with Ganglia


  3. This allows for the use of JMX-aware applications to collect information about the state of the servers. Pro Hadoop


  4. In this chapter, we will first introduce the management framework, Java Management Extension (JMX) for system monitoring. Hadoop Operations and Cluster Management Cookbook


  5. The nodetool program communicates with the JMX port to confirm that the server is alive. Cassandra High Performance Cookbook


  6. The collection mechanism for metrics is decoupled from the component that receives the updates, and there are various pluggable outputs, including local files, Ganglia, and JMX. Hadoop:The Definitive Guide 3rd Edition


  7. HBase already had a number of metrics exported through JMX. Apache Hadoop Goes Realtime at Facebook


  8. Cassandra has a rich Java Management Extensions (JMX) interface baked in, which we put to use to monitor all of these and more. Cassandra The Definitive Guide


  9. Notably, the new metrics subsystem (referred to as metrics2) supports sending metrics to multiple plug-ins, filtering of metrics in various ways, and more complete support for JMX. Hadoop Operations


  10. It shows how to configure ActiveMQ for JMX monitoring and demonstrates this using JConsole. ActiveMQ in Action



87、单词 unix 的匹配文本:


  1. The Ganglia projects uses the autotools build system to detect the tools available on most Linux and UNIX-like environments and build the binaries. Monitoring with Ganglia


  2. The first few appendixes showed you how to install CouchDB packages for Unix-like, Mac OS X, and Windows operating systems. CouchDB The Definitive Guide


  3. We will discuss more about it in the Avoid swapping on Unix-like systems section in Chapter 6, Fighting with Fire. Mastering ElasticSearch


  4. The defaults for a Unix/Linux system limit the number of open files. Cassandra High Performance Cookbook


  5. It does not assume any knowledge in relation to CouchDB, although some experience with UNIX commands and basic JavaScript skills are beneficial. Beginning CouchDB


  6. The input files argument accepts standard Unix shell file-name-matching metacharacters to identify the files to be processed. Interpreting the Data Parallel Analysis with Sawzall


  7. You can check if it's there by using the env command on UNIX-like systems and the set command on Windows. Elasticsearch in Action MEAP V07 cHAPTER 1


  8. Chubby exports a file system interface similar to, but simpler than that of UNIX [22] Chubby - The lock service for loosely-coupled distributed systems


  9. The clock is stored as a long (representing the milliseconds since the Unix epoch), so we could wrap this in a new java.util.Date object if we wanted to. Cassandra The Definitive Guide


  10. Click on the link to the 5.4.1 release and you’ll find both tarball and zip formats available (the tarball is for Linux and Unix; the zip is for Windows) ActiveMQ in Action



88、单词 packt 的匹配文本:


  1. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. Apache Flume Distributed Log Collection for Hadoop


  2. Little did I know, when I was first asked by Packt Publishing whether I would be interested in writing a book about HBase administration on September 2011, how much work and stress (but also a lot of fun) it was going to be. HBase Administration Cookbook


  3. Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Hadoop Operations and Cluster Management Cookbook


  4. However, Packt Publishing cannot guarantee the accuracy of this information. Gradle Effective Implementation Guide


  5. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Mastering ElasticSearch


  6. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Cassandra High Performance Cookbook


  7. I would like to commend the authors of this book for a job well done, and would like to thank Packt Publishing for the opportunity to assist in the editing of this publication. Hadoop RealWorld Solutions Cookbook


  8. Thanks to the editorial staff at Packt, for providing me the opportunity to write this book and for providing feedback and guidance throughout the process. Hadoop MapReduce Cookbook


  9. Here, you can access, read and search across Packt's entire library of books. Getting Started with Google Guava


  10. Thank you, Packt Publishing for selecting me as one of the technical reviewers for this wonderful book. Hadoop Beginners Guide



89、单词 ids 的匹配文本:


  1. We haven’t even considered complexities like postal addresses (or the fact that there might be more than one), and my contact record doesn’t include additional details like my Skype IDs, instant messaging accounts, or that I have a few more addresses than those listed above. Getting Started with CouchDB


  2. The bigger picture we’re working on is like an app server, but different in one crucial regard: rather than let the developer do whatever he wants (loop a list of DocIDs and make queries, make queries based on the results of other queries, etc.), we’re defining “safe” transformations, such as view, show, list, and update. CouchDB The Definitive Guide


  3. A Document has a required integer DocId and optional Links, containing a list of Forward and Backward entries holding DocIds of other web pages. Dremel - Interactive Analysis of WebScale Datasets


  4. The columns are sorted chronologically, using unique IDs (UUIDs) for column names to avoid clashes. Cassandra High Performance Cookbook


  5. This unique ID can be assigned by the user or application, or it can use a universally unique identifier (UUID)—a random number generated by CouchDB that greatly reduces the chance of duplicate IDs being used. Beginning CouchDB


  6. CouchDB will perform best with document IDs that are mostly monotonic (in simpler terms, mostly sequential) Scaling CouchDB


  7. A trace context is a small and easily copyable container of span attributes such as trace and span ids. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  8. Each shard returns the doc IDs and sort values of all of the docs in its priority queue to the coordinating node, Node 1, which merges these values into its own priority queue to produce a globally sorted list of results. Elasticsearch The Definitive Guide


  9. It is still a common design requirement to store IDs related to other entities in your tables, but operations such as cascading deletes are not available. Cassandra The Definitive Guide


  10. Its speed is the result of the combination of a fast transactional journal comprised of data log files, the highly optimized indexing of message IDs, and inmemory message caching. ActiveMQ in Action



90、单词 aop 的匹配文本:


  1. If you are familiar with the AOP Spring Framework, it is similar to a MethodInterceptor. Apache Flume Distributed Log Collection for Hadoop


  2. Because of this, mixins have part of the flavor of aspect-oriented programming (AOP), and aspects are often suggested to solve the mixin problem. Thinking in Java 4th Edition


  3. However, the good news for Spring and AOP developers is that starting from version 2.0, Spring provides much tighter integration with AspectJ. Pro Spring 3


  4. In this chapter, we will cover dependency injection and AOP. Pro Spring MVC


  5. The main points you will see throughout the different projects is the use of DI, XML namespace-based configuration, and AOP, which as you will see in the next section, are the pillars upon which Spring is built on. Pro Spring Security


  6. Advising the Build Graph In the early 2000s, the paradigm of aspect-oriented programming (AOP) became a marginally popular way to manage the complexity of enterprise Java software. Gradle Beyond The Basics


  7. You can think of observer coprocessors as analogous to triggers from a relational database or to advice from aspect-oriented programming (AOP) HBase in Action


  8. The DTD bean and property mechanism can’t hide complex bean creation, which is a shortcoming in configuring advanced features like aspect-oriented programming (AOP) and security. Spring Batch in Action


  9. The goal of AOP is to enable systemic changes of the same kind across many modules, while avoiding copying and pasting the same code over and over into each location. Programming Scala


  10. Using retry advice on a data access layer via Spring’s AOP (aspect-oriented programming) support is one way to implement this functionality. Spring Data Modern Data Access for Enterprise Java



91、单词 serialization 的匹配文本:


  1. Tiering Flume agents is covered using Avro serialization, as well as using the Flume command line as a standalone Avro client for testing and importing data manually. Apache Flume Distributed Log Collection for Hadoop


  2. MapReduce, however, is essentially functional, so expressing a graph algorithm as a chained MapReduce requires passing the entire state of the graph from one stage to the next—in general requiring much more communication and associated serialization overhead. Pregel  -  A System for Large-Scale Graph Processing


  3. Every individual host in the grid is an active participant, and together they cooperate, organically distributing the workload while avoiding serialization and single points of failure. Monitoring with Ganglia


  4. The extra serialization involved in using super columns and extra space used makes them less efficient. Cassandra High Performance Cookbook


  5. This allows Ceph to leverage the intelligence present in OSDs to distribute the complexity surrounding data access, update serialization, replication and reliability, failure detection, and recovery. Ceph A Scalable, High-Performance Distributed File System


  6. Elasticsearch uses JSON (or JavaScript Object Notation ) as the serialization format for documents. Elasticsearch The Definitive Guide


  7. For example, it uses two different JSON serialization libraries, the Google collections project, and several Apache Commons libraries. Cassandra The Definitive Guide


  8. However, this proved to be inefficient because of the serialization and deserialization costs associated with translating to and from Sawzall’s native type system. Tenzing - A SQL Implementation On The MapReduce Framework


  9. The main purpose of this interface is to provide mechanisms for the serialization and deserialization of data as it is passed across the network or read and written from the disk. Hadoop Beginners Guide


  10. From a client perspective, the NIO transport connector is practically the same as the standard TCP connector, in terms of its use of TCP as the underlying network protocol and OpenWire as the message serialization protocol. ActiveMQ in Action



92、单词 namenode 的匹配文本:


  1. Pacemaker ships with an IPaddr resource agent to manage VIP, so we only need to create our own namenode resource agent: HBase Administration Cookbook


  2. They will experience substantial idle cycles waiting for replies from the namenode. HDFS scalability the limits to growth


  3. The file system protocol is hdfs, the host to contact for services is NamenodeHost, and the port to connect to is 8020, which is the default port for HDFS. Pro Hadoop


  4. Format the HDFS filesystem from NameNode with the following command: hadoop namenode -format. Hadoop Operations and Cluster Management Cookbook


  5. Replace NAMENODE_ HOSTNAME and PORT with the relevant values corresponding to the NameNode of your HDFS cluster. Hadoop MapReduce Cookbook


  6. Data compatibility concerns persistent data and metadata formats, such as the format in which the HDFS namenode stores its persistent data. Hadoop:The Definitive Guide 3rd Edition


  7. Writing variations also exist for the different Hadoop components such as NameNode (name node, namenode, and namenode), DataNode, JobTracker, and TaskTracker. Hadoop in Action


  8. It often runs on the same server as the namenode in smaller clusters. Cassandra The Definitive Guide


  9. Namenode 1 Stores filesystem metadata, stores file to block map, and provides a global picture of the filesystem. Hadoop Operations


  10. It runs the namenode and jobtracker daemons, but it also has Hive installed on it. Programming Hive



93、单词 uri 的匹配文本:


  1. This is curl telling us that it is going to establish a TCP connection to the CouchDB server we specified in our request URI. CouchDB The Definitive Guide


  2. Configure the NameNode RPC and HTTP URI for namenode1 by adding the following into the file: Hadoop Operations and Cluster Management Cookbook


  3. We can pass a URL or URI instance as an argument. Gradle Effective Implementation Guide


  4. The simplest way is to add a routing URI parameter when indexing a document. Mastering ElasticSearch


  5. What the server does with your request depends on the URI you are issuing the request to, the request method you are using (GET, POST, PUT, DELETE), and the data you send along with your request. Beginning CouchDB


  6. The subsequent rewrite rule then proxies all requests to URIs starting with /api to the equivalent URI on http://couch-master.example.com:5984 (again, only if the previous rewrite condition has been met): Scaling CouchDB


  7. You can identify the current filesystem of the FileSystem object using the getUri() function as follows. Hadoop MapReduce Cookbook


  8. The file:/ URI denotes a path on the local filesystem, and the hdfs:/ URI denotes a path in HDFS. Hadoop in Practice


  9. As we’ll see shortly, most setups don’t need to specify the scheme://authority part of the URI. Hadoop in Action


  10. Without going too deep into discussing URIs, let’s briefly summarize the URI structure. ActiveMQ in Action



94、单词 jsp 的匹配文本:


  1. Check the job status via the MapReduce admin page—http://master1:50030/ jobtracker.jsp: HBase Administration Cookbook


  2. To see it, visit http://bugs.sun.com/bugdatabase/index.jsp and enter the bug identification number 4252539 Core Java Volume I Fundamentals 9th Edition


  3. The response of a Servlet is rendered by the view component— the Java Server Page (JSP) Gradle in Action


  4. After the teragen job finishes, we can check the node storage space usage by opening the URL http://master:50070/dfsnodelist.jsp?whatNodes=LIVE. Hadoop Operations and Cluster Management Cookbook


  5. The MapReduce administration JSP enables you to view both the overall state of the cluster and details of the particular job execution. Professional Hadoop Solutions


  6. In CDH you can access the JobTracker UI at http://localhost:50030/jobtracker.jsp. Hadoop in Practice


  7. Adding the preceding fragment to web.xml will display the /error-404.jsp page to a client attempting to access a nonexistent resource. Java EE 7 Essentials


  8. Web technologies such as JavaServer Faces (JSF) and JavaServer Pages (JSP) rely heavily on the JavaBeans component model. Core Java Volume II Advanced Features 9th Edition


  9. You can download the latest JDK for other operating systems from Sun at http://java.sun.com/javase/downloads/index.jsp. Hadoop in Action


  10. Application frameworks such as Java Server Pages (JSP) and Java Server Faces (JSF) are built upon servlet technology. Java 7 Recipes A Problem-Solution Approach



95、单词 dynamically 的匹配文本:


  1. Metric modules are dynamically loadable shared object files that extend the available metrics gmond is able to collect. Monitoring with Ganglia


  2. Most programming environments solve this problem with templates; for example, documents that look like HTML but have portions of their content filled out dynamically. CouchDB The Definitive Guide


  3. Changing the analyzer during indexing The next thing worth mentioning when it comes to handling multilingual data is the possibility of dynamically changing the analyzer during indexing. Mastering ElasticSearch


  4. Figure 2: Ceph dynamically maps subtrees of the directory hierarchy to metadata servers based on the current workload. Ceph A Scalable, High-Performance Distributed File System


  5. This requires, the ability to dynamically partition the data over the set of nodes (i.e., storage hosts) in the cluster. Cassandra A Decentralized Structured Storage System


  6. This alleviates the performance issues associated with generating a view dynamically each and every time it is run, especially on databases that store millions of records of data. Beginning CouchDB


  7. The number of replica shards can be changed dynamically on a live cluster, allowing us to scale up or down as demand requires. Elasticsearch The Definitive Guide


  8. Robust data structures • An efficient, small binary format for RPC calls • Easy integration with dynamically typed languages such as Python, Ruby, Smalltalk, Perl, PHP, and Objective-C Avro has certain advantages that Thrift doesn’t, in particular the fact that static code generation is not required to use RPC for your application, though you can use it for. Cassandra The Definitive Guide


  9. Each class will dynamically send and receive to/from the topics (an example is provided next) ActiveMQ in Action


  10. Finally, Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk. Bigtable - A Distributed Storage System for Structured Data



96、单词 cached 的匹配文本:


  1. This means that they are largely restricted to handling GET requests, but it also means they can be cached by HTTP proxies like Varnish. CouchDB The Definitive Guide


  2. What's more, cached filters don't require too much memory (it only carries the information about which documents matche the filter) and can be easily reused by consecutive queries run against the same filter to greatly increase query performance. Mastering ElasticSearch


  3. Remember that all the columns of a row must be cached when using the row cache. Cassandra High Performance Cookbook


  4. By default, if a readdir is immediately followed by one or more stats, the briefly cached information is returned; otherwise it is discarded. Ceph A Scalable, High-Performance Distributed File System


  5. The metadata about the ranges a node is responsible is cached locally at each node and in a fault-tolerant manner inside Zookeeper - this way a node that crashes and comes back up knows what ranges it was responsible for. Cassandra A Decentralized Structured Storage System


  6. Assuming you have cached your documents, then some percentage of these requests will result in cache hits. Scaling CouchDB


  7. These cached filters can be reused very efficiently for subsequent requests. Elasticsearch The Definitive Guide


  8. A notification of a conflicting lock request would tell a client to finish using data associated with the lock: it would finish pending operations, flush modifications to a home location, discard cached data, and release. Chubby - The lock service for loosely-coupled distributed systems


  9. If you use a fraction, you’re indicating a percentage of keys to cache, and an integer value indicates an absolute number of keys whose locations will be cached. Cassandra The Definitive Guide


  10. ActiveMQ supports both of these types of message delivery and can also be configured to support message recovery, an in-between state where messages are cached in memory. ActiveMQ in Action



97、单词 avro 的匹配文本:


  1. Tiering Flume agents is covered using Avro serialization, as well as using the Flume command line as a standalone Avro client for testing and importing data manually. Apache Flume Distributed Log Collection for Hadoop


  2. A typical Hadoop-based Big Data platform includes the Hadoop Distributed File System (HDFS), the parallel computing framework (MapReduce), common utilities, a column-oriented data storage table (HBase), high-level data management systems (Pig and Hive), a Big Data analytics library (Mahout), a distributed coordination system (ZooKeeper), a workflow management module (Oozie), data transfer modules such as Sqoop, data aggregation modules such as Flume, and data serialization modules such as Avro. Hadoop Operations and Cluster Management Cookbook


  3. You also learn about Apache Avro — an Hadoop framework for data serialization, which can be tightly integrated with Hadoop-based storage. Professional Hadoop Solutions


  4. The OutputFormat uses Avro and more JARs are required to run the job. Cassandra High Performance Cookbook


  5. It shows how to use different serialization libraries, including Avro, Thrift, and Protocol Buffers. Hadoop RealWorld Solutions Cookbook


  6. The focus of chapter 3 is on ways to work with data stored in different formats, such as XML and JSON, which paves the way to a broader examination of data formats such as Thrift and Avro that work best with big data and Hadoop. Hadoop in Practice


  7. Avro A serialization system for efficient, cross-language RPC and persistent data storage. Hadoop:The Definitive Guide 3rd Edition


  8. This directory includes the Thrift and Avro RPC libraries for interacting with Cassandra. Cassandra The Definitive Guide


  9. They are Protocol Buffers (created by Google and hosted at http://code.google.com/p/protobuf), Thrift (originally created by Facebook and now an Apache project at http://thrift.apache.org), and Avro (created by Doug Cutting, the original creator of Hadoop) Hadoop Beginners Guide


  10. Some of the more common sources are an Avro RPC source to which applications can speak directly, syslog, a raw netcat-style source, and a source that executes a command and reads from its standard output. Hadoop Operations



98、单词 oozie 的匹配文本:


  1. A typical Hadoop-based Big Data platform includes the Hadoop Distributed File System (HDFS), the parallel computing framework (MapReduce), common utilities, a column-oriented data storage table (HBase), high-level data management systems (Pig and Hive), a Big Data analytics library (Mahout), a distributed coordination system (ZooKeeper), a workflow management module (Oozie), data transfer modules such as Sqoop, data aggregation modules such as Flume, and data serialization modules such as Avro. Hadoop Operations and Cluster Management Cookbook


  2. It is tested at scale, rest of ecosystem projects like Pig, Hive, Oozie, etc. Apache Hadoop YARN Yet Another Resource Negotiator


  3. Oozie, an open source Apache project, has functionality for building workflows and coordinating job running. MapReduce Design Patterns


  4. Why do you need an additional orchestration layer on top of MapReduce, and how does Oozie fit the bill? Professional Hadoop Solutions


  5. Apache Oozie is a workflow system for Hadoop MapReduce computations. Hadoop MapReduce Cookbook


  6. For this book you’ll also be working with Oozie, HBase, and other projects, but you’ll find instructions for these technologies in their respective sections. Hadoop in Practice


  7. Oozie A service for running and scheduling workflows of Hadoop jobs (including MapReduce, Pig, Hive, and Sqoop jobs) Hadoop:The Definitive Guide 3rd Edition


  8. HBase Hadoop database for random read/write access Hive SQL-like queries and tables on large data sets Pig Dataflow language and compiler Oozie Workflow for interdependent Hadoop jobs Sqoop Integrates databases and data warehouses with Hadoop. HBase:The Definitive Guide


  9. Oozie We have said many times that Hadoop clusters do not live in a vacuum and need to integrate with other systems and into broader workflows. Hadoop Beginners Guide


  10. Apache Oozie It’s not uncommon for large production clusters to run many coordinated MapReduce jobs in a workfow. Hadoop Operations



99、单词 writable 的匹配文本:


  1. The key on each record will always be a LongWritable containing the current timestamp or if the timestamp event header is set, will be used instead. Apache Flume Distributed Log Collection for Hadoop


  2. In this case, the filesystem is both readable and writable. Hadoop Operations and Cluster Management Cookbook


  3. To avoid permission issues, carry out the installation in user-writable directories. Cassandra High Performance Cookbook


  4. In the mapper, we use the ThriftWritable class of Elephant Bird to wrap the WeblogRecord object. Hadoop RealWorld Solutions Cookbook


  5. Choosing appropriate Hadoop data types Hadoop uses the Writable interface based classes as the data types for the MapReduce computations. Hadoop MapReduce Cookbook


  6. The DBInputFormat class requires a bean representation of the table being imported, which implements both the Writable and DBWritable interfaces. Hadoop in Practice


  7. You may notice we use special classes such as LongWritable , IntWritable , and Text instead of the more familiar Long, Integer, and String classes of Java. Hadoop in Action


  8. That is, a distributed database designer must choose to make the system either always readable or always writable. Cassandra The Definitive Guide


  9. Take a look at the signatures of our mapper and reducer classes: the WordCountMapper class gives IntWritable and Text as input and gives Text and IntWritable as output. Hadoop Beginners Guide


  10. The version 1.0.0 lineage of Hadoop releases has a large number of group writable files in the tarball. Hadoop Operations



100、单词 nutch 的匹配文本:


  1. For example, it supports benchmarks for Nutch (a text indexing software package), PageRank (the PageRank algorithm), the Mahout machine learning algorithms, and Hive queries. Hadoop Operations and Cluster Management Cookbook


  2. Shortly after the release of the paper, a free and open source software pioneer by the name of Doug Cutting started working on a MapReduce implementation to solve scalability in another project he was working on called Nutch, an effort to build an open source search engine. MapReduce Design Patterns


  3. In 2002, while developing a project called Nutch (a search engine project focused on crawling, indexing, and searching Internet web pages), Doug Cutting and Mike Cafarella were struggling with a solution for processing a vast amount of information. Professional Hadoop Solutions


  4. From there, Hadoop was extracted out of Nutch and eventually became an Apache top-level project. HBase in Action


  5. The Apache Hadoop project was started as a support project for the Apache Nutch search engine, before spawning off as a separate top-level project. Hadoop MapReduce Cookbook


  6. My team was making discoveries similar to those that Doug Cutting and others at Nutch had made several years earlier regarding how to efficiently store and manage terabytes of crawled and analyzed data. Hadoop in Practice


  7. They’d devised systems to solve precisely the problems we were having with Nutch. Hadoop:The Definitive Guide 3rd Edition


  8. It’s not perfect, but it’s ready for other people to play with and examine.” Mike had been working with Doug Cutting on Nutch, an open source search engine. HBase:The Definitive Guide


  9. Hadoop started out as a subproject of Nutch , which in turn was a subproject of Apache Lucene. Hadoop in Action


  10. Thanks, Doug At the same time, Doug Cutting was working on the Nutch open source web search engine. Hadoop Beginners Guide



101、单词 subclasses 的匹配文本:


  1. To enable this optimization the user subclasses the Combiner class, overriding a virtual Combine() method. Pregel  -  A System for Large-Scale Graph Processing


  2. FlumeJava includes subclasses of DoFn, e.g., MapFn and FilterFn, that provide simpler interfaces in special cases. FlumeJava - Easy, Efficient Data-Parallel Pipelines


  3. The most common input formats are subclasses of FileInputFormat, with the Hadoop default being TextInputFormat. MapReduce Design Patterns


  4. These two objects are subclasses of DataInputStream and DataOutputStream from the Java I/O package, which means that they support standard I/O operations. Professional Hadoop Solutions


  5. Pig has an abstract class called LoadFunc that has a number of methods that must be provided by concrete subclasses, as seen in figure 11.4 Hadoop in Practice


  6. Subclasses need only populate data in HDFS (perhaps by copying from a local file), run a MapReduce job, and confirm the output is as expected. Hadoop:The Definitive Guide 3rd Edition


  7. There are specific subclasses, depending on where the host is used, in other words, on the master, region server, and so on. HBase:The Definitive Guide


  8. This is an abstract class for interfacing with the filesystem, and there are different concrete subclasses for handling HDFS and the local filesystem. Hadoop in Action


  9. In most cases, the implementation of a MapReduce job will provide job-specific subclasses of the Mapper and Reducer base classes found in this package. Hadoop Beginners Guide


  10. Every instance of class Object (and its subclasses) possesses a lock that is obtained on entry to a synchronized method and automatically released upon exit. Concurrent Programming in Java Design Principles and Pattern



102、单词 operand 的匹配文本:


  1. There are actually two forms of these operators; you’ve just seen the postfix form of the operator that is placed after the operand. Core Java Volume I Fundamentals 9th Edition


  2. AND: It means that the given two terms (left and right operand) need to match in order for the clause to be matched. Mastering ElasticSearch


  3. An example is division when the second operand is zero or sqrt when its argument is negative. Java 8 in Action Lambdas, Streams and Functional-style Programming


  4. At an assignment to a field: if the type of the left-hand operand is a raw type, then a compile-time unchecked warning occurs if erasure changes the field's type. The Java Language Specification Java SE 7 Edition


  5. The bytecode has been annotated with the hexadecimal values corresponding to the opcodes and operand data. Oracle JRockit The Definitive Guide


  6. It is known as the functional approach because methods return the result of applying a function to their operand without modifying it. Effective Java 2nd Edition


  7. The assumption is that the first four bits of the command represent an operator, while the next 16 bits are an operand. Java 7 New Features Cookbook


  8. Some operators operate on only one operand; these are called unary operators. Java in a Nutshell 5th Edition


  9. Specifically, the conditional-And operator, &&, evaluates its right side only if the result of its left operand is true. Java Performance Tuning 2nd Edition


  10. Many of the constraints on bytecodes can be checked statically, such as the operand of an “ldc” bytecode must be a valid constant pool index whose type. Java Performance



103、单词 php 的匹配文本:


  1. He has worked with Perl/ PHP/Python, primarily for coding and MySQL/Oracle as the backend, for several years prior to moving into Hadoop fulltime. Apache Flume Distributed Log Collection for Hadoop


  2. Before I move on, however, I should mention that gweb is a PHP program, which most people run under the Apache web server (although any web server with PHP or FastCGI support should do the trick) Monitoring with Ganglia


  3. If you are not familiar with JavaScript but have used C or any other C-like language such as Java, PHP, or C#, this should look familiar. CouchDB The Definitive Guide


  4. He is also a well-known speaker who delivers talks on Drupal, Open Source, PHP, Moodle, and so on, along with organizing and delivering TechTalks in Drupal meetups and Drupal Camps in Mumbai, India. Mastering ElasticSearch


  5. Generating Thrift bindings for other languages (C++, PHP, and others) Cassandra High Performance Cookbook


  6. Later in this book you will use other tools to interact with the CouchDB database, including CouchDB’s very own Futon web-based administration interface, the JavaScript client-side web development language, and the PHP, Ruby, and Python programming languages. Beginning CouchDB


  7. Zach is now a developer at Elasticsearch and maintains the PHP. Elasticsearch The Definitive Guide


  8. Recall that PHP was originally designed to work within static HTML content. Hadoop in Action


  9. To generate Thrift bindings for other languages, pass it to the --gen switch (for example, thrift --gen php) Cassandra The Definitive Guide


  10. Although ActiveMQ is written in Java, APIs for many languages other than Java are provided, including C/C++, .NET, Perl, PHP, Python, Ruby, and many more. ActiveMQ in Action



104、单词 applet 的匹配文本:


  1. We describe how to package programs in JAR files, and how to deliver applications over the Internet with the Java Web Start and applet mechanisms. Core Java Volume I Fundamentals 9th Edition


  2. A Macintosh applet or application will have a different GUI than that of a Windows or Unix system. Java Network Programming and Distributed Computing


  3. Also, the decision to compile Java to JVM bytecode (a virtual machine code that soon every browser supported) meant that it became the language of choice for internet applet programs (do you remember applets?) Java 8 in Action Lambdas, Streams and Functional-style Programming


  4. The use of applet containers providing 1.1 APIs is outside the scope of this specification. Java EE 7 Specification


  5. In fact, whenever the class loader of an applet reads a JAR file, it reads and decompresses data from the network. Core Java Volume II Advanced Features 9th Edition


  6. The title of the window is specified using the java_applet_title parameter. Java 7 New Features Cookbook


  7. As of Java 1.2, any Java code, whether it is an applet, a servlet, a JavaBeans component, or a complete Java application, can be run with restricted permissions that prevent it from doing harm to the host system. Java in a Nutshell 5th Edition


  8. This also allows you to place the thread’s task in a subclass of some other class, such as Applet or HTTPServlet. Java Network Programming 4th Edition


  9. As is typical of GUI-based programs, ParticleApplet uses several auxiliary classes that do most of the work. Concurrent Programming in Java Design Principles and Pattern


  10. The following code is that of a Java applet class. Java 7 Recipes A Problem-Solution Approach



105、单词 sharding 的匹配文本:


  1. We’ve mitigated this problem somewhat by setting up the Ganglia web UI on each server to redirect requests to the appropriate server for each category of metrics, so end users see a unified system that hides the sharding. Monitoring with Ganglia


  2. You could use multi-master, master/slave, partitioning, sharding, write-through caches, and all sorts of other complex techniques. CouchDB The Definitive Guide


  3. Those Lucene indices are called shards and the process of this spreading is called sharding. Mastering ElasticSearch


  4. The MySQL sharding scheme assigned each customer and all related data to a fixed shard. Spanner - Google’s Globally-Distributed Database


  5. Working with secondary indexes The primary ordering and sharding is done by the row key. Cassandra High Performance Cookbook


  6. As your data keeps growing, an important feature is the ability to split your data across multiple servers, also known as sharding. Elasticsearch in Action MEAP V07 cHAPTER 1


  7. Because it doesn’t have any sharding capabilities, you need to be careful not to overwhelm the database from which you are reading by running too many mappers. Hadoop:The Definitive Guide 3rd Edition


  8. This was true of many tools and techniques for scaling data processing, including caching (for example, memcached), replication, sharding, and, of course, MapReduce/ Hadoop. Hadoop in Action


  9. Sharding and shared-nothing architecture If you can’t split it, you can’t scale it. Cassandra The Definitive Guide


  10. A better and actually very common approach is to start by sharding (i.e., distributing) the data quasi randomly across the machines. PowerDrill - Processing a Trillion Cells per Mouse Click



106、单词 optimized 的匹配文本:


  1. In large grids with thousands of nodes per cluster, or carefully optimized HPC grids, in which every CPU cycle spent on something other than the problem is a wasted cycle, “normal” compute nodes are often configured as deaf in order to minimize the overhead associated with aggregating cluster state. Monitoring with Ganglia


  2. An index is a data structure optimized for quick search and retrieval. CouchDB The Definitive Guide


  3. The algorithm addresses a general case when a query may reference repeated fields; a simpler optimized version is used for flat-relational queries, i.e., those referencing only required and optional fields. Dremel - Interactive Analysis of WebScale Datasets


  4. Performance considerations The previous query execution is already optimized by ElasticSearch internals, by using caching mechanism. Mastering ElasticSearch


  5. Because Cassandra handles replication, disk systems can be optimized for more performance versus redundancy. Cassandra High Performance Cookbook


  6. This strategy provides the best of both worlds: streaming updates to disk in an efficient (sequential) fashion, and a vastly reduced re-write workload, allowing the long-term on-disk storage layout to be optimized for future read access. Ceph A Scalable, High-Performance Distributed File System


  7. In some highly optimized services even small monitoring overheads are easily noticeable, and might compel the deployment teams to turn the tracing system off. Dapper a Large-Scale Distributed Systems Tracing Infrastructure


  8. The node handling the request can no longer make use of the optimized bulk format to parse just the metadata line in order to decide which shard should receive the request. Elasticsearch The Definitive Guide


  9. This model is optimized for reading data, as it allows data to be read from any slave. Cassandra The Definitive Guide


  10. Its speed is the result of the combination of a fast transactional journal comprised of data log files, the highly optimized indexing of message IDs, and inmemory message caching. ActiveMQ in Action



107、单词 transactional 的匹配文本:


  1. Flume only provides transactional guarantees for each channel in each individual agent. Apache Flume Distributed Log Collection for Hadoop


  2. Operations within an account are transactional and consistent: a user who sends or labels a message is guaranteed to observe the change despite possible failover to another replica. Megastore - Providing Scalable, Highly Available Storage for Interactive Services


  3. Accountants don’t use erasers” is a quote from Pat Helland, a senior architect of transactional systems who worked at Microsoft and Amazon. CouchDB The Definitive Guide


  4. Third, F1 requires strong transactional semantics, which made using other NoSQL systems impractical. Spanner - Google’s Globally-Distributed Database


  5. In Chapter 3, Application Programmer Interface we should know how to use transactional style locking using Cages. Cassandra High Performance Cookbook


  6. However there have been some applications that have asked for transactional mainly for the purpose of maintaining secondary indices. Cassandra A Decentralized Structured Storage System


  7. Within a single CouchDB node, an update of a single document is transactional. Scaling CouchDB


  8. It’s transactional, supporting JTA/JTS, two-phase commit, deadlock detection, and MVCC. Cassandra The Definitive Guide


  9. This is a file-based message store that combines a transactional journal, for reliable message storage and recovery, with good performance and scalability. ActiveMQ in Action


  10. For example, in a transactional web-indexing system the programmer can make assumptions like: the hash of the contents of a document is always consistent with the table that indexes duplicates. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications



108、单词 amazon 的匹配文本:


  1. Creating mountains of data has become very easy, thanks to Facebook, Twitter, Amazon, digital cameras and camera phones, YouTube, Google, and just about anything else you can think of connected to the Internet. Apache Flume Distributed Log Collection for Hadoop


  2. This issue is also relevant when monitoring dynamic environments such as cloud resources (for example, Amazon EC2) Monitoring with Ganglia


  3. Werner Vogels, Amazon CTO and Vice President If availability is a priority, we can let clients write data to one node of the database without waiting for other nodes to come into agreement. CouchDB The Definitive Guide


  4. Amazon EC2 discovery The Amazon store, in addition to selling goods has a few popular services as selling storage or computing power in the pay-as-you-go model. Mastering ElasticSearch


  5. Dynamo[6] is a storage system that is used by Amazon to store and retrieve user shopping carts. Cassandra A Decentralized Structured Storage System


  6. For example, Amazon has a tool called Elastic Load Balancing and Rackspace provides a service called Rackspace Cloud Load Balancers (in beta as of this writing) Scaling CouchDB


  7. The exponential growth of data first presented challenges to cutting-edge businesses such as Google, Yahoo, Amazon, and Microsoft. Hadoop in Action


  8. This original version of Cassandra was written primarily by an ex-employee from Amazon and one from Microsoft. Cassandra The Definitive Guide


  9. Before joining Improve Digital, he spent time at Amazon.co.uk, where he led several software development teams building systems that process Amazon catalog data for every item worldwide. Hadoop Beginners Guide


  10. When a user makes a purchase on Amazon, there are quite a few separate stages through which that order must travel including order placement, invoice creation, payment processing, order fulfillment, shipping, and more. ActiveMQ in Action



109、单词 ssl 的匹配文本:


  1. We suggest you read up on VPNs and setting up CouchDB behind an HTTP proxy (like Apache httpd’s mod_proxy, nginx, or varnish) that will handle SSL for you. CouchDB The Definitive Guide


  2. For a while, Kerberos implementations had the option of using Kerberized SSL (KSSL) for HTTP authentication — and it is still an option in many distributions. Professional Hadoop Solutions


  3. If SSL mutual authentication is used as the thentication protocol, the client can manage its own authentication ntext, and need not depend on references to the login session state. Java EE 7 Specification


  4. Unfortunately, CouchDB does not support SSL at the time of writing. Beginning CouchDB


  5. Using SSL over HTTP is transparent to application programmers; simply use URLs that start with https. Core Java Volume II Advanced Features 9th Edition


  6. Chapter 9, Database, Security, and System Enhancements: Various database improvements such as the introduction of the new RowSetFactory class are illustrated along with how to take advantage of new SSL support. Java 7 New Features Cookbook


  7. In Java 1.4 and later, you can use https:// with the URL class as previously shown to securely download documents from web servers that support SSL. Java in a Nutshell 5th Edition


  8. It discusses ActiveMQ URIs and all the transport connectors for both client-to-broker and broker-to-broker communications including TCP, NIO, STOMP, failover, SSL, HTTP/S, and much more. ActiveMQ in Action


  9. A similar technique can be used to disable the SSLv2Hello pseudo-protocol described above, which is not recognized by some SSL implementations. Fundamental Networking in Java


  10. There are a great number of events that can be handled using a WebViewClient such as a malformed URL or a failed SSL handshake. Java 7 Recipes A Problem-Solution Approach



110、单词 ejb 的匹配文本:


  1. Currently, he is developing web applications using the latest Java technologies on the market (EJB 3.0, CDI, Spring, JSF, Struts, Pro Java 7 NIO.2


  2. Similar access to databases is provided to JSP SF applications, and servlets by the Web Container, and to enterprise y the EJB Container. Java EE 7 Specification


  3. Transactional annotation brings transactional semantics to POJOs (plain old Java objects), outside of an EJB (Enterprise JavaBean) Java EE 7 Essentials


  4. We’d like to address a common confusion before going any further: The JavaBeans that we discuss in this chapter have little in common with Enterprise JavaBeans (EJB) Core Java Volume II Advanced Features 9th Edition


  5. He has written numerous publications and has also written the EJB 3.1 Cookbook. Java 7 New Features Cookbook


  6. Particular care should be taken when determining applicable requirements based on the presence of EJB Lite in the Web Profile. Java EE 7 Web Profile Specification


  7. However, in their book J2EE Performance Testing with BEA WebLogic Server (Expert Press), Peter Zadrozny, Philip Aston, and Ted Osborne state that there is no real difference in performance between these two options, as long as the beans are explicitly removed from the container when the session expires.[2] The beans are removed automatically by binding the session to a session listener that calls ejbRemove( ) when the session terminates. Java Performance Tuning 2nd Edition


  8. Technologies such as those discussed in chapter 2 (COM, CORBA, DCE, and EJB) using RPC are considered to be tightly coupled. ActiveMQ in Action


  9. A special thanks goes to Rahul Biswas for providing content relating to EJB and Java persistence and also for his willingness to review multiple drafts and provide valuable feedback. Java Performance


  10. The Enterprise JavaBean (EJB) 3.0 model used interfaces for interacting with methods that performed database work. Java 7 Recipes A Problem-Solution Approach



111、单词 manning 的匹配文本:


  1. Manning Early Access (MEAP) readers who posted comments in the Author Online forum. Mahout in Action


  2. Copyright 2014 Manning Publications For more information on this and other Manning titles go to. Netty in Action MEAP V08


  3. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Gradle in Action


  4. Without his recommendation, Manning would not have come knocking on our inboxes, and we would never have written a book together. RabbitMQ in Action


  5. Toward the end of 2010, the idea of writing HBase in Action for Manning came up. HBase in Action


  6. I’m also indebted to Cynthia Kane, my development editor at Manning, who coached me through writing this book and provided invaluable feedback on my work. Hadoop in Practice


  7. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Hadoop in Action


  8. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. ActiveMQ in Action


  9. Please post comments or corrections to the Author Online forum at http://www.manning-sandbox.com/forum.