Elasticsearch at Kickstarter

Back in December 2012, we developed a new version of our project search tool on Kickstarter using Elasticsearch. We’re really happy with the results and have since found Elasticsearch’s filtering and faceting features useful in tools for project creators, our message inbox and in other areas of the site. I’d like to write a little on how we gradually rolled out Elasticsearch as it might be useful for others looking at adding secondary data stores to their stack.

Ramping up

We liked Elasticsearch’s features but were initially cautious about how it would behave in production, so we deployed a change that let us divert a percentage of project search requests to a new version built with Elasticsearch:

ratio = File.read(ELASTICSEARCH_RATIO_PATH).to_f rescue 0.0
experimental = (rand <= ratio)

Over time we ramped up the percentage of traffic sent to our Elasticsearch implementation, keeping a close eye on internal metrics to evaluate its performance. Fortunately we didn’t hit any major snags, so before long we were sending 100% of our project search traffic to Elasticsearch. There are some great posts by Etsy and Flickr that go into more detail on config flags and rolling out features gradually for more reading on the topic.

An index primer

An index in Elasticsearch is a logical namespace for data and can store multiple types of documents. Types roughly correspond to business models, and each index has a mapping that defines how it stores its types. At Kickstarter, each index defines the mapping for just one type, so an index for projects only defines the mapping for a project type. A very simplistic mapping for a project type with a name and goal might look like this:

$ curl -XGET 'http://localhost:9200/projects/project/_mapping'
{
  "project": {
    "properties": {
      "name": {
        "type": "string"
      },
      "goal": {
        "type": "double",
        "null_value": 0.0
      }
    }
  }
}

Keeping indices up to date

MySQL is our canonical data store. When an index in Elasticsearch is first created, it contains no documents, so a full index must be performed from MySQL to populate it. Once the index has been populated, it’s ready to respond to search requests. However, the data in MySQL changes over time. New projects are created, existing projects are updated. These changes need to be sent to Elasticsearch or the projects index will have stale/outdated data.

Each document has an ID in Elasticsearch, and a document can be updated by performing an index operation using that ID. Each project document in Elasticsearch has the same ID as its corresponding record in MySQL. When the project changes in MySQL, we’re able to reindex just that project document in Elasticsearch so that our search index is only a few seconds delayed behind MySQL.

The need for new indices

Performing updates to documents in Elasticsearch to keep them in sync with MySQL has taken us some time to get right (a topic for another blog post!), but one way we’ve mitigated problems with stale data is by making it really easy to create a new index and fully populate it with the latest data from MySQL. This is also useful when the mapping for a type needs to change. Rather than updating the mapping for an existing index, we create an index with the new mapping and populate it, and any old indices are left as is. This avoids having to deal with mapping merge conflicts or inconsistencies with documents having been indexed using different mappings.

This process of creating and populating new indices started off with a cron task to fully index projects every 20 minutes. As we improved our ability to keep Elasticsearch in sync with MySQL, we reduced the frequency of the cron task so that now the full index is only performed nightly.

The full indexing nitty-gritty

Each time we create a new index, it is given a name based on the type and time, e.g.  projects_2013_05_19_13_33_27. It takes some time to fully populate a new index, so while it is building, all our reads continue go to the existing projects index. Elasticsearch has a nifty aliasing feature that allows us to associate indices with an alias. Search requests are sent to the alias, which directs the requests to any indices that it has been associated with. Our application code directs all project read requests to an alias named projects. When full indexing is complete, the projects alias is atomically switched from the old index to the new index, so we never need to hardcode index names like projects_2013_05_19_13_33_27 into our application.

Some of our more complex indices take several hours to build, so we also had to figure out what to do with records that updated while performing a full index. Both the new and existing indices need to be updated, otherwise one would have stale data.

When a new index for projects is being populated, we associate it with the projects_new alias. We tried sending a bulk request to index changes in both projects and projects_new, but if a full index isn’t taking place then this request would 404 since no index would be associated with the projects_new alias. Instead, we query Elasticsearch before each write to retrieve the indices aliased to projects and projects_new, and perform a single bulk indexing request directly against those indices.

The nice thing about our setup is that performing a full index has no user impact. The existing index is kept up to date, and the new index is only switched once it’s completely ready.

Read the article in full here: http://www.kickstarter.com/backing-and-hacking/elasticsearch-at-kickstarter

ZFS on Linux and MySQL

Data centerI am currently working with a large customer and I am involved with servers located in two data centers, one with Solaris servers and the other one with Linux servers. The Solaris side is cleverly setup using zones and ZFS and this provides a very low virtualization overhead. I learned quite a lot about these technologies while looking at this, thanks to Corey Mosher.

On the Linux side, we recently deployed a pair on servers for backup purpose, boxes with 64 300GB SAS drives, 3 raid controllers and 192GB of RAM. These servers will run a few slave instances each of production database servers and will perform the backups.  The write load is not excessive so a single server can easily handle the write load of all the MySQL instances.  The original idea was to configure them with raid-10 + LVM, making sure to stripe the LV when we need to and align the partition correctly.

We got decent tpcc performance, nearly 37k NoTPM using 5.6.11 and xfs.  Then, since ZFS on Linux is available and there is in house ZFS knowledge, we decided to reconfigure one of the server and give ZFS a try.  So I trashed the raid-10 arrays, configure JBODs and gave all those drives to ZFS (30 mirrors + spares + OS partition mirror) and I limited the ARC size to 4GB.  I don’t want to start a war but ZFS performance level was less than half of xfs for the tpcc test and that’s maybe just normal.  We didn’t try too hard to get better performance because we already had more than enough for our purpose and some ZFS features are just too useful for backups (most apply also for btrfs). Let’s review them.

Snapshots

ZFS does snapshot, like LVM but… since it is a copy on write filesystem, the snapshots are free, no performance penalty.  You can easily run a server with hundreds of snapshots.  With LVM, your IO performance drops to 33% after the first snapshot so keeping a large number of snapshots running is simply not an option.  With ZFS you can easily have:

  • one snapshot per day for the last 30 days
  • one snapshot per hour for the last 2 days
  • one snapshot per 5min for the last 2 hours

and that will be perfectly fine.  Since starting a snapshot take less than a second, you could even be more zealous.  Pretty interesting to speed up point in time recovery when you dataset is 700GB.  If you google a bit with “zfs snapshot script” you’ll many scripts ready for the task.  Snapshots work best with InnoDB, with MyISAM you’ll have to start the snapshot while holding a “flush tables with read lock” and the flush operation will take some time to complete.

Compression

ZFS can compress data on the fly and it is surprisingly cheap.  In fact the best tpcc results I got were when using compression.  I still have to explain this, maybe it is related to better raid controller write cache use.  Even the fairly slow gzip-1 mode works well.  The tpcc database, which contains a lot of random data that doesn’t compress well showed a compression ration of 1.70 with gzip-1.  Real data will compress much more.  That gives us much more disk space than we expected so even more snapshots!

Integrity

With ZFS each record on disk has a checksum.  If a cosmic ray flip a bit on a drive, instead of crashing InnoDB, it will be caught by ZFS and the data will be read from the other drive in the mirror.

Better availability and disk usage

On purpose, I allocated mirror pairs using drives from different controllers.  That way, if a controller dies, the storage will still be working.  Also, instead of having 1 or 2 spare drives per controller, I have 2 for the whole setup.  A small but yet interesting saving.

All put together, ZFS on Linux is a very interesting solution for MySQL backup servers.  All backup solutions have an impact on performance with ZFS the impact is up front and the backups are almost free.

The post ZFS on Linux and MySQL appeared first on MySQL Performance Blog.

Read the article in full here: http://www.mysqlperformanceblog.com/2013/05/24/zfs-on-linux-and-mysql/

Percona Server for MySQL 5.5.31-30.3 now available

Percona Server for MySQL version 5.5.31-30.3

Percona Server for MySQL version 5.5.31-30.3

Percona is glad to announce the release of Percona Server for MySQL 5.5.31-30.3 on May 24, 2013 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.31, including all the bug fixes in it, Percona Server 5.5.31-30.3 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.31-30.3 milestone at Launchpad.

New Features:

Bugs Fixed:

  • Fix for bug #1131187 introduced a regression that could cause a memory leak if query cache was used together with InnoDB. Bug fixed #1170103.
  • Fixed the RPM packaging regression that was introduced with the fix for bug #710799. This regression caused mysql schema to be missing after the clean RPM installation. Bug fixed #1174426.
  • Fixed the Percona-Server-shared-55 and Percona-XtraDB-Cluster-shared RPM package dependences. Bug fixed #1050654.
  • Fixed the upstream bug #68999 which caused compiling Percona Server to fail on CentOS 5 and Debian squeeze due to older OpenSSL version. Bug fixed #1183610.
  • If a slave was running with its binary log enabled and then restarted with the binary log disabled, Crash-Resistant Replication could overwrite the relay log info log with an incorrect position. Bug fixed #1092593.
  • Fixed the CVE-2012-5615 vulnerability. This vulnerability would allow remote attacker to detect what user accounts exist on the server. This bug fix comes originally from MariaDB (see MDEV-3909). Bug fixed #1171941.
  • Fixed the CVE-2012-5627 vulnerability, where an unprivileged MySQL account owner could perform brute-force password guessing attack on other accounts efficiently. This bug fix comes originally from MariaDB (see MDEV-3915). Bug fixed #1172090.
  • mysql_set_permission was failing on Debian due to missing libdbd-mysql-perl package. Fixed by adding the package dependency. Bug fixed #1003776.
  • Rebuilding Debian source package would fail because dpatch and automake were missing from build-dep. Bug fixed #1023575 (Stephan Adig).
  • Backported the fix for the upstream bug #65077 from the MySQL 5.6 version, which removed MyISAM internal temporary table mutex contention. Bug fixed #1179978.

Release notes for Percona Server for MySQL 5.5.31-30.3 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server for MySQL 5.5.31-30.3 now available appeared first on MySQL Performance Blog.

Read the article in full here: http://www.mysqlperformanceblog.com/2013/05/24/percona-server-for-mysql-5-5-31-30-3-now-available/

Stuff The Internet Says On Scalability For May 24, 2013

Hey, it’s HighScalability time:

 

  • ~20K : Netflix AWS instances; 100 million hours per minute: Youtube video upload;
  • Quotable Quotes:
    • @sw17ch: Computer Science is thinking about thinking. Software Engineering is thinking about how to avoid thinking.
    • @neha: I am starting a distributed systems reading group at MIT. Suggestions on papers to read? Current list here.
    • John Sheehan: Services are the new process
    • @cheeseplus: Sharding isn’t a scalability strategy, it’s a failure mode in progress.
    • @basharatw: @adrianco Features in days, not months; hw in mins not weeks; incident response in secs not hours … there’s a trade off for utopia #gluecon
    • @mgroeninger: @johnsheehan now telling a story about struggling against tools… quit to build a better hammer #gluecon < this is the heart of devops
    • @aneel: ”you really have to do a reorg to do devops and you really have to do a reorg to do cloud-native” – @adrianco #gluecon
    • @voodoogeek: scalability. why is it so hard to understand? and please please do NOT tell me it was not foreseeable and the usual BS.
    • @joestump: Celery’s queue routing key stuff is pretty swanky. If you don’t need low latency messaging, highly recommend celery + SQS. 0 maintenance.
  • There are more kinds of programming in heaven and earth than are dreamt of in your data structure books…Cell-Based Computing Goes Analog: designing circuits in Escherichia coli that could perform functions based on a range of inputs, much like the temperature gauge on a thermostat. Specifically, the circuits were sensitive to levels of sugar arabinose or acyl homoserine lactone.
  • Startups are the new intentional communities attempting to do right by hacking human nature and founding a utopia. To see this in action take a look at Why I Spent 200 Hours Writing Culture Code Instead of Python Code. A Walden 3.0?
  • Horst Simon on Why we need Exascale and why we won’t get there by 2020: You could say that the end of the HPC world as we know it began in 2004, when we hit the inflection point of power use and clock speed. That’s when we realized that we could not keep increasing clock speed due to power demands (and heat), but needed to move to much greater parallelism. In “new” HPC, power is the primary design constraint for future HPC system design; data movement dominates costs, so we need to optimize to minimize data movement; to increase concurrency we look to exponential growth of parallelism within chips. This “new” reality fundamentally breaks our current programming paradigm and computing ecosystem.

Don’t miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge…

Read the article in full here: http://feedproxy.google.com/~r/HighScalability/~3/R_VT5Q9I5KY/stuff-the-internet-says-on-scalability-for-may-24-2013.html

Infrastructure as Code – A comprehensive overview

I’ve been tracking infrastructure as code for a few years now. Over the years it has gotten closer to real code.

Close but no sigar yet…. We’ve come a long way but when you compare it to real languages it still feels in it’s infancy. In this updated overview I gave at the ABUG, I went through:

  • the basic concepts of infrastructure as code
  • the differences/concepts in the languages (chef, puppet, …)
  • the editors , syntax checkers, highlighting
  • integration with git version control
  • integration with CI systems
  • the different forms of testing (syntax, compile, unit, smoke testing)
  • using vagrant, veewee and the tools in that eco-system
  • debugging , profiling your code

This talk is probably the most comprehensive tool list that I’ve seen/made about the subject. But feel free to post and add your findings in the comments!

Note: that at the end of the presentation there are many extra links still to be sorted or slight outdated tools.

I’ve given previous versions of this talk at Devoxx 2012 and Jax2012. Enjoy the Jax2012 video here:

Read the article in full here: http://feedproxy.google.com/~r/jedi/IZwx/~3/Dn15t0ayqHY/

Podcast: Community Contributions to Puppet

Podcast: Community Contributions to Puppet

In this episode of the Puppet Labs Podcast, we talked about how we handle code contributions to Puppet and other projects along with a few details about other ways for community members to contribute.

Listen to Jeff McCune, Adrien Thebo and Hailee Kenney talk with me about tools and resources to help you get started and test your contributions. They also talk about what makes a good contribution and how we work with community members to improve their pull requests.

See the rest of the podcasts.

Learn more:

Read the article in full here: https://puppetlabs.com/blog/podcast-community-contributions-to-puppet/

How To Break Departmental Silos By Forming Feature Teams

Imagine a seven year old playing the piano. She hits every note like it’s the only one, taking long breaks between each note. The play drags and listening to the singular notes is a pain. Instead of music, all you hear is a bunch of individual sounds, each one rivaling with the others to be the loudest one.
Now imagine the same play performed by the same kid five years later. The notes flow like a river, the emphasis not on individual sounds but on the whole sequence at once. Listening to the piece is pure joy, because every note works together with the others to create a beautiful experience.
Distributing your software development through separate departments is like a seven year old playing the piano. Every department works on its own, rivaling with the others to be the most important. The output is a pain for your customers and the quality is really poor. But how can we create an organization where the individual parts play nicely together? One way of making departments play together are feature teams. Let’s see how this could work out.

Feature teams are cross-functional teams

They consist of team members from every department involved in feature development: business analysts, developers, quality assurance, operations, and so on. Ideally, they’re co-located – working in a big room together or at least in the next room. This shortens communication paths and speeds up necessary information exchange.

Feature teams are responsible for a certain set of features or a specific service. At amazon.com every part of the website like recommendations or reviews are separate services developed by individual feature teams. Each team is fully responsible for the feature, from idea to operations. Any challenges that arise within their specific feature must be solved by the team.

Solving challenges is simpler in smaller teams

If you form feature teams you should keep them small. A famous quote from Amazon: “If a project team can eat more than two pizzas, it’s too large”. Keep your team between five to nine people. This will keep the complexity of inner-team communication as low as possible. You scale your projects by combining multiple small teams – each one fully responsible for certain features.

Embed every required skill within your teams to create a successful feature. This isn’t possible with all skill sets, but you should limit external party support as much as possible. External dependencies always slow down a team and increase complexity.

It’s all about reducing complexity

Trying to drive a feature through multiple departments can be quite a challenge. Every department has their own goals which usually conflict with other departments. Even if they want to help, they have a hard time understanding exactly how.

Building feature teams tears down those departmental barriers and joins people from every department into one team. This helps to create empathy and, more importantly, trust – the number one factor to speed things up in complex environments. Without trust, blame games and CYA tactics are rampant. Only if people know each other, have the same goals, and work together are they more likely to trust one another.

That trust thing sounds great, but won’t work for us

Let’s say you already have feature teams. This is often the case in so called “matrix-organizations” where every employee is part of a certain department but works in multiple projects. And with multiple projects, the problems start. The employees are supposed to serve multiple bosses: the project leads as well as their department head. Because they’re assigned to multiple projects, they usually find home in their department and give it the highest priority.

But giving the highest priority to the department is not what you want. Therefore it’s important to assign people to one and only one project. Make that project (and the feature team driving it) the home for your employees. The department should mainly serve as interest group.

Feature teams take responsibility

They’re fully responsible – end-to-end – for a certain feature or service. They’re small, co-located and they have all the needed skills to be successful. Feature teams act as the home base for employees as departments fade into the background providing mentorship for honing one’s skills. But the main, business contribution happens in the feature teams.

If you setup your organization like this, you’ll begin to see a nice flow of great feature releases – like a well played musical piece.

Read the article in full here: http://feedproxy.google.com/~r/agileweboperations/~3/o998vpjM3z8/how-to-break-departmental-silos-by-forming-feature-teams

What Google I/O 2013 means for Google Apps

At Google I/O, Google’s annual developer conference, the company released myriad application updates, the most notable being an updated Google+.

Read the article in full here: http://www.techrepublic.com/blog/google-in-the-enterprise/what-google-io-2013-means-for-google-apps/2451

What Is a DevOps Engineer?

Demand for people with DevOps skills is growing rapidly because businesses get great results from DevOps.

Organizations using DevOps practices are overwhelmingly high-functioning: They deploy code up to 30 times more frequently than their competitors, and 50 percent fewer of their deployments fail, according to our 2013 State of DevOps survey.

With all this goodness, you’d think there were lots of DevOps engineers out there. However, just 18 percent of our survey respondents said someone in their organization actually had this title.

Why is that?

In part, it’s because defining what DevOps engineers do is still in flux. That hasn’t stopped people from hiring for DevOps skills, though. Between January 2012 and January 2013, listings for DevOps jobs on Indeed.com increased 75 percent. On LinkedIn.com, mentions of DevOps as a skill increased 50 percent during the same period.

Our survey revealed the same trend. Half of our 4,000-plus respondents (in more than 90 countries) said their companies consider DevOps skills when hiring.

What are DevOps skills?

Our respondents identified the top three skill areas for DevOps staff:

  • Coding or scripting
  • Process re-engineering
  • Communicating and collaborating with others

These skills all point to a growing recognition that software isn’t written in the old way anymore. Where software used to be written from scratch in a highly complex and lengthy process, creating new products is now often a matter of choosing open source components and stitching them together with code. The complexity of today’s software lies less in the authoring, and more in ensuring that the new software will work across a diverse set of operating systems and platforms right away.

Likewise, testing and deployment are now done much more frequently. That is, they can be more frequent — if developers communicate early and regularly with the operations team, and if ops people bring their knowledge of the production environment to design of testing and staging environments.

Discussion of what distinguishes DevOps engineers is all over blogs and forums, and occurs whenever technical people gather.

There’s lots of talk, for example, about pushing coders – not just code – over the wall into operations. Amazon CTO Werner Vogels said in an interview that when developers take on more responsibility for operations, both technology and service to customers improve.

“The traditional model is that you take your software to the wall that separates development and operations, and throw it over and forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer.”

The resulting customer feedback loop, Vogels said, “is essential for improving the quality of the service.”

Longtime developer and entrepreneur Rich Pelavin of Reactor8 also sees benefits from DevOps culture in terms of increased responsibility for everyone:

“I’ve seen organizations where engineers get beepers, so they’re the ones who get beeped if it goes wrong [in deployment]. That pushes them into the rest of the software lifecycle. I think that’s a great idea.”

That’s a real change from non-DevOps environments, where developers make their last commits and head home…or to the ping-pong table.

What is a DevOps engineer, anyway? And should anyone hire them?

There’s no formal career track for becoming a DevOps engineer. They are either developers who get interested in deployment and network operations, or sysadmins who have a passion for scripting and coding, and move into the development side where they can improve the planning of test and deployment. Either way, these are people who have pushed beyond their defined areas of competence and who have a more holistic view of their technical environments.

DevOps engineers are a pretty elite group, so it’s not surprising that we found a smaller number of companies creating that title. Kelsey Hightower, who heads operations here at Puppet Labs, describes these people as the “Special Forces” in an organization.

“The DevOps engineer encapsulates depth of knowledge and years of hands-on experience,” Kelsey said. “You’re battle tested. This person blends the skills of the business analyst with the technical chops to build the solution – plus they know the business well, and can look at how any issue affects the entire company.”

If DevOps is understood primarily as a mindset, it can get awfully fuzzy. But enough people are attempting definitions for us to offer this list of core DevOps attributes:

  • Ability to use a wide variety of open source technologies and tools
  • Ability to code and script
  • Experience with systems and IT operations
  • Comfort with with frequent, incremental code testing and deployment
  • Strong grasp of automation tools
  • Data management skills
  • A strong focus on business outcomes
  • Comfort with collaboration, open communication and reaching across functional borders

Even with broad agreement about core DevOps attributes, controversy surrounds the term “DevOps engineer.” Some say the term itself contradicts DevOps values.
Jez Humble, the co-author of Continuous Delivery, points out that just calling someone a DevOps engineer can create a third silo in addition to dev and ops — “…clearly a poor (and ironic) way to try and solve these problems.”

DevOps, he says, proposes “strategies to create better collaboration between functional silos, or doing away with the functional silos altogether and creating cross-functional teams (or some combination of these approaches).” In the end, Humble relents, saying it’s okay to call people doing DevOps by that term, if you really want to.

Becoming a DevOps Engineer: What Does it Take?

If you believe DevOps is the future, you’ll want to start expanding your skills — and experience — to compete for these new jobs.

While it’s great to beef up your coding skills, and get familiar with automation tools, you’ll also want to seek out projects and new roles that allow you to exercise the “soft” skills that are at the core of DevOps. Find opportunities to collaborate within and outside of your team. Help your company move to a faster test and deployment rhythm. Be open to listening to others’ ideas. Keep in mind that DevOps is less about doing things a particular way, and more about moving the business forward and giving it a stronger technological advantage.

Learn more about DevOps

Bridging the Two Worlds: IT and Networking

Hiring for the DevOps Toolchain: The Need for Generalists

Read the article in full here: https://puppetlabs.com/blog/what-is-a-devops-engineer/

Paper: Calvin: Fast Distributed Transactions for Partitioned Database Systems

Distributed transactions are costly because they use agreement protocols. Calvin says, surprisingly, that using a deterministic database allows you to avoid the use of agreement protocols. The approach is to use a deterministic transaction layer that does all the hard work before acquiring locks and the beginning of transaction execution.
Overview:

Many distributed storage systems achieve high data access throughput via partitioning and replication, each system with its own advantages and tradeoffs. In order to achieve high scalability, however, today’s systems generally reduce transactional support, disallowing single transactions from spanning multiple partitions. Calvin is a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. Unlike previous deterministic database system prototypes, Calvin supports disk-based storage, scales near-linearly on a cluster of commodity machines, and has no single point of failure. By replicating transaction inputs rather than effects, Calvin is also able to support multiple consistency levels—including Paxos based strong consistency across geographically distant replicas—at no cost to transactional throughput.

If you are interested Daniel Abadi gives a very accessible overview of Calvin in If all these new DBMS technologies are so scalable, why are Oracle and DB2 still on top of TPC-C? A roadmap to end their dominance.

Read the article in full here: http://feedproxy.google.com/~r/HighScalability/~3/49OOzrnVznE/paper-calvin-fast-distributed-transactions-for-partitioned-d.html

Experiences with the McAfee MySQL Audit Plugin

I recently had to do some customer work involving the McAfee MySQL Audit Plugin and would like to share my experience in this post.

Auditing user activity in MySQL  has traditionally been challenging. Most data can be obtained from the slow or general log, but this involves a lot of data you don’t need too, and isn’t flexible at all. The specific problem of logging failed connection attempts has been discussed on a previous post in our blog.

Starting with 5.1, the new plugin API gives us more flexibility by allowing users to extend the server’s functionality with their own code, and this is what the McAffee plugin does.

Installation and configuration are straightforward following the available instructions. The only extra step I had to take was to extract the offsets for the Percona Server version I was using for the test (5.5.28-29.1). This is needed as the plugin needs the offset to some MySQL data structures that, the plugin authors say, aren’t exposed by a consistent API. If you also need to do this, the details are clearly explained here.

The plugin writes its output in json format, and supports writing it directly to a file, or to a unix socket, which means you can write a script to listen on this socket and process the audit records as you wish.

Performance-wise, I did basic tests on the VM I was working in and didn’t get significant differences between either output option, or between using the plugin or enabling the general log. Bear in mind these were basic tests (just a few mysqlslap runs with increasing levels of concurrency), but initially, I would think the advantage of the plugin is its flexibility, and not its performance, which seems to be on par with having the general log enabled.

The flexibility comes from the three variables that can be set to control what is logged by the plugin:
– audit_record_cmds : This is the list of commands you want written to the log (all the lists in these variables are comma separated). As pointed here, anything that would generate a write to the general log will be sent to the plugin, and you can control if it gets written on not with this list. I tested this with “connect,Quit” to log successful and failed connections. Yes, it had to be a capital Q in Quit for that to work, and no, my code-fu was not enough to understand why that is the case. Maybe someone more knowledgeable in MySQL internals can enlighten me here.
– audit_record_objs : List of database objects (tables, according to the docs) for which you want events written to the log.
– audit_whitelist_users : This one is undocumented on the wiki at the time of writing, and is a list of users for which you do not want events written to the log.

Just for reference, these are the lines I had to add to my config file for the plugin to work (plus one commented line for switching between file and socket for output):


plugin-load=AUDIT=libaudit_plugin.so
audit_offsets=6464, 6512, 4072, 4512, 104, 2584
audit_json_file=1
audit_json_socket_name=/tmp/audit.sock
#audit_json_socket=1
audit_json_log_file=/var/lib/mysql/audit.log
audit_record_cmds=connect,Quit

Notice the audit_offsets that I mentioned had to be extracted due to this Percona Server version not being included in the binary.

And here’s a few sample output lines generated by the plugin with this configuration:

{"msg-type":"activity","date":"1369155747373","thread-id":"6439","query-id":"0","user":"debian-sys-maint","priv_user":"debian-sys-maint","host":"localhost","cmd":"Connect","query":"Connect"}
{"msg-type":"activity","date":"1369155747373","thread-id":"6439","query-id":"219309","user":"debian-sys-maint","priv_user":"debian-sys-maint","host":"localhost","cmd":"Quit","query":"Quit"}
{"msg-type":"activity","date":"1369155747383","thread-id":"6440","query-id":"0","user":"debian-sys-maint","priv_user":"debian-sys-maint","host":"localhost","cmd":"Connect","query":"Connect"}

In conclusion, the plugin API seems to be opening new possibilities of extending MySQL’s behavior in a way that, once set up, is transparent to users, and the McAfee MySQL Audit Plugin is only one of example of what can be achieved with it. It is a very good one for me, since I think proper audit trail support has been an important missing feature on the server, which has made using MySQL in PCI or SOX compliant environments, to name just two, artificially complicated, as one had to rely on too much info (general log) or external help (snort or similar IDS).

The post Experiences with the McAfee MySQL Audit Plugin appeared first on MySQL Performance Blog.

Read the article in full here: http://www.mysqlperformanceblog.com/2013/05/23/experiences-with-the-mcafee-mysql-audit-plugin/

Managing Change to Enable Agile Operations

Today’s organizations need to quickly ship products and features to customers to keep up with the market, and this puts a huge stress on IT operations. As new applications and services are developed, the Operations team needs to have the infrastructure ready to deploy the necessary changes as they come in. As more and more development teams embrace Agile practices, changes can come at an alarming rate.  The infrastructure must always be ready to deploy the changes coming down the pipeline. This requires the operations teams to be agile, sometimes even more so than the development teams.

The changes coming from development teams need to be supported by operations managing the day-to-day change that enables efficiency.  Business applications and services consist of three inputs: the OS (provisioning management), the business configuration (configuration management), and the external configuration (patch management) which are updates from external sources such as OS distributions and vendors.  Each of the three inputs have regular changes that need to be managed.

It’s easy to think of each input of change as a single entity, separate from the others.  For instance, we tend to think of patch management as an ongoing process separate from configuration management.  However, the reality is patching software affects the applied configuration of a node.  Changing the configuration can change what needs to be patched.  Upgrading the OS can change the applied configuration.  Having a change management process that allows operations teams to easily understand how the three inputs (provision, configuration, and patch) relate and affect one another, enables us to better manage all of the ongoing change within our infrastructure.  Staying on top of the ongoing infrastructure change has the benefit of keeping our infrastructure ready for anything that comes down the pipeline, enabling agile operations.

I’m going to walk through the three inputs of provision management, configuration management, and patch management, and illustrate how they relate to each other at a high level.  In later posts, we’ll dive into the details of implementation.

In this post, change management refers to the operations process that collects incoming change from the three inputs, relates the changes to each other, and applies the changes together.  This process can be fully automated or use a mixture of automation and manual review.

Provision Management

Provision management refers to the selection and installation of an operating system, as well as the ongoing management of a provisioning system.  The first phase of creating any node is provisioning the OS. We tend to think of the OS as static, not as something having change we need to manage.  We usually ascribe managing OS updates to the patch management process, including OS point releases to get us from, for example, Red Hat 6.1 to Red Hat 6.2.

The reality is the core operating system will inevitably have a major version release that should be treated as an item for change management.  Upgrading the major release of an OS can have an effect on the configuration management.  As new ways of managing the OS are introduced, the applied configuration may need to change to ensure the applications and services deployed won’t be affected.  Further, a new major OS version means new software updates to manage.

Configuration Management

Whether you use a configuration management tool like Puppet or do manual configuration, there will be changes to applied configuration that should be managed in a change management process.  New applications will need to be deployed.  Existing applications will need updates.  Some of these new applications and updates won’t need changes to the configuration of the underlying OS, but when they do, your change management process needs to be able to handle it efficiently.

When new packages are required to be installed on the OS to enable new applications and services, any updates to those packages will have to be managed by the patch management system.

Patch Management

Every operating system will have updates.  Traditionally, we tend to think of software updates as being directly related to the nodes the updates are available on.  However, when software updates are approved and applied, they aren’t being approved for the nodes.  They’re being approved for the specific configuration of the node they’re being applied on.  Therefore, software updates should be tested, approved, and applied to configurations, not nodes.  Any change management process should test, approve, and apply software updates alongside incoming changes to configuration management.

Conclusion

A change management process should take all three inputs that deliver business applications and services, and manage the flow of change from each input in a single process.  If you are managing change from each piece in isolation, you’re missing the relationship between the incoming changes.  Understanding and testing the relationships between seemingly disparate changes is the best way to ensure the efficiency and reliability of your change management processes.

Read the article in full here: https://puppetlabs.com/blog/managing-change-to-enable-agile-operations/

Puppet 3.2 Introduces an Experimental Parser and New Iteration Features

Puppet 3.2.1 landed today. Though it’s a “patch” release, it’s the first public release of the Puppet 3.2 series, and it includes a taste of the Puppet DSL’s future in the form of an experimental parser that introduces some new features you’d expect to find in traditional programming languages.

I spoke to Puppet product owner Eric Sorenson about the Puppet 3.2 series, and he called out the new parser as the headline item:

“It’s called ‘Future Parser,’” he said, “because that’s the command line argument you have to pass to puppet in order to turn it on. It’s really a ground-up reimplementation of the Puppet language, using an expression- instead of statement-based grammar, which allows both a lot more power and flexibility with what you can do inside the language.”

Eric said the new parser was a response to the tough decision made with Puppet 3.1 to deprecate the Puppet Ruby DSL, a pure Ruby implementation of the Puppet DSL.

“What we did instead was double down on the Puppet DSL by implementing the things that people who were using the Ruby DSL were actually trying to do. We asked, ‘what is that capability? What is it that made you feel the Puppet DSL wasn’t meeting your needs and caused you to turn to the Ruby DSL?’”

According to Eric, “people wanted the ability to iterate. They wanted loops. Puppet looks enough like a programming language that people who are used to programming languages say, ‘I should be able to do all the things I do in Ruby or another language.’”

In the experimental parser, Puppet has picked Ruby-like “each” loops, along with many of the same methods one could call on an enumerable object in Ruby: inject, collect, select and reject.

Eric noted that the new parser isn’t enabled by default, “because there are a couple of things we want to work out, and we really want to hear feedback on.”

For instance, “there are two implementations of loop structures. There’s one where each looks like a function similar to existing Puppet parser functions like include or template. The other one is a dot-suffix operator, like that found in Ruby.”

The goal of including two ways to express the same thing is all about community feedback:

“We’re interested in learning which people think is more readable and more in line with their expectations. We’re going to do some user testing from the UX side and solicit feedback from the community.”

You can give the new parser a shot by updating to Puppet 3.2 and enabling it in one of two ways:

  • Setting parser = future in your puppet.conf file
  • Adding the command line switch --parser=future

Eric said the experimental parser will remain so until the Puppet developer and UX team can gather community feedback, and learn how it interacts with all the Puppet manifests already out in the wild:

“We’re curious not just about how the parser works specifically with new constructs, but how it works with the existing body of Puppet code.”

You can read more about the new parser, including how to use the new iteration functions, in some documentation prepared by Puppet developer Henrik Lindberg on the Puppet docs site.

External CA Support

Puppet 3.2 also includes better support for external certificate authorities, contributed by an unusual visit from Mozilla’s Dustin Mitchell, who flew out to the Puppet Labs offices to work directly with the Puppet development team.

“We’ve never had anybody from outside the organization come in and sit with us and actually do community contributor stuff in real-time,” said Eric.

Dustin’s contribution involved providing Mozilla’s own Puppet use as a blueprint to development efforts around external CA support:

“Because SSL is hard and PKI is hard, we didn’t want to go in shotgun and make everything under the sun work. We wanted to really specifically focus on what Mozilla’s use case was and use it as a spec to write to.”

The result of that approach is a working reference for other Puppet users who want to use an external certificate authority:

“If they follow the Mozilla model, then we can guarantee that will be supported in Puppet. [Puppet Labs developer] Jeff McCune did some great documentation work around that, describing what’s supported and not supported.”

Wi-Fi Puppet With OpenWRT

Also new in Puppet 3.2 is a collection of updates that address compatibility with OpenWRT, a Linux distribution popular on embedded systems like Wi-Fi routers. OpenWRT makes it possible to turn simple consumer routers into full-fledged Linux systems, and now Puppet can manage them.

Some of the support is predicated on the recent Facter 1.7 release:

“There were some assumptions in Facter that were not correct for non-x86 architecture,” Eric said. Facter addresses that problem by working better with ARM chipsets.

In Puppet, “the support is specifically for OpenWRT. There’s a new package provider that supports the OpenWRT OPKG packaging package manager,” and there’s also support for managing services on OpenWRT.

“Facter 1.7 enables the detection of OpenWRT, and Puppet 3.2 guarantees we can do something useful,” said Eric.

And Everything Else

Besides a new, experimental parser, better support for external CAs, and support for a whole new class of device, Puppet 3.2 includes a few other new features:

  • Puppet’s “splay” setting, which was meant to mitigate the problems caused by a thundering herd of agents hitting their puppet master at the same time, has been improved.
  • Puppet officially supports Ruby 2.
  • Puppet module tool now works on Windows

… and even more. As always, the complete list of bug fixes and new features can be found in the official release notes for Puppet 3.2.

Learn More

Read the article in full here: https://puppetlabs.com/blog/puppet-3-2-introduces-an-experimental-parser-and-new-iteration-features/

Percona XtraBackup 2.1.3 for MySQL available for download

Percona XtraBackup for MySQL Percona is glad to announce the release of Percona XtraBackup 2.1.3 for MySQL on May 22, 2013. Downloads are available from our download site here and Percona Software Repositories.

This release fixes a high priority bug. It’s advised to upgrade your latest 2.1 version to 2.1.3 if you’re using the Percona XtraBackup with Percona XtraDB Cluster. This release is the latest stable release in the 2.1 series.

Bug Fixed:

Release notes with all the bugfixes for Percona XtraBackup 2.1.3 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

* * *

Percona XtraBackup is the world’s only open-source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases. With Percona XtraBackup, you can achieve the following benefits:

  • Backups that complete quickly and reliably
  • Uninterrupted transaction processing during backups
  • Savings on disk space and network bandwidth
  • Automatic backup verification
  • Higher uptime due to faster restore time

XtraBackup makes MySQL hot backups for all versions of Percona Server, MySQL, MariaDB, and Drizzle. It performs streaming, compressed, and incremental MySQL backups.

Percona’s enterprise-grade commercial MySQL Support contracts include support for XtraBackup. We recommend support for critical production deployments.

The post Percona XtraBackup 2.1.3 for MySQL available for download appeared first on MySQL Performance Blog.

Read the article in full here: http://www.mysqlperformanceblog.com/2013/05/22/percona-xtrabackup-2-1-3-for-mysql-available-for-download/

Why you shouldn’t hire a devops

Lately there have been a lot of organisations trying to hire a devops engineer.
I myselve have been asked to fill in devops roles ..

There’s a number of issues with that.

The biggest problem is that I always have to ask what exactly the organisation is looking for.

So you want a devops engineer with experience in Linux, MongoDB, MySQL and Java , does that mean you want a Java developer who is familiar with MySQL and Linux and breaths a devops Culture.
Or a Linux expert who understands Java developers and knows how to tune Mongo and MySQL ?

It’s absolutely unclear what you want when you are hiring “A devops engineer”

The second problem is that you are trying to hire people who are knowledgeable about devops,

Yet a lot of those people know that you can’t do devops on your own , devops is not a jobtitle. devops is not a new devops team you create.

To some of them you are even making a fool out of yourselve, as to them you show that you don’t understand devops

On top .. the ones that do apply for this fancy new devops role, are the ones that might not get the fact that the problem isn’t about tooling but about people working together and helping eachother , so you end up hiring the wrong people.

Even in todays devops culture a system engineer is still a system engineer, and a developer is still a developer.
You might have developers supporting the build tool chain, or system engineers focussing on infrastructure automation.

But as John said almost 3 years ago they are good at their job.

Devops is not a word you slap onto a tool, a team or a person and expect magic to happen

Let’s face it .. devops is hard, you can’t do this on your own .. you need to find the right people ..

Read the article in full here: http://www.krisbuytaert.be/blog/why-you-shouldnt-hire-devops

Malware in the Google Play Store: Enemy inside the gates

Google Play has experienced some recent malware infestations. Learn about the details and how to protect yourself and your users.

Read the article in full here: http://www.techrepublic.com/blog/google-in-the-enterprise/malware-in-the-google-play-store-enemy-inside-the-gates/2445

Percona MySQL University @Portland: June 17

Percona CEO Peter Zaitsev leads a track at the inaugural Percona MySQL University event in Raleigh, N.C. on Jan. 29, 2013.

Peter Zaitsev leads a track at the inaugural Percona MySQL University event in Raleigh, N.C. on Jan. 29, 2013.

Portland is a well-recognized hub for Open Source technologies in the Northwest, home to conferences such as OSCON and Open Source Bridge as well as hosts of OpenSQL Camp in 2009. As such it is a very natural place for our next Percona MySQL University event scheduled for June 17.

We run this event in partnership with MySQL Meetup at Portland organized by our own Daniel Nichter, who recently moved to the area.

Percona MySQL University is a daylong, free, fast-paced and very technical MySQL educational event for wide range of people interested in MySQL – Developers, System Administrators, DBAs, etc. It will be held at Portland State University’s Smith Memorial Student Union.

We’ll finalize the schedule next week and still have some speaking opportunities available – if you would like to share your MySQL story at this event please email Matthew Dowell by Tuesday, May 28.

If you’re not in Portland and would like Percona MySQL University to come to your city, please fill out the form to let us know. We’ll try to come to the cities showing greatest interest.

As usual space is limited, so Register Now!

The post Percona MySQL University @Portland: June 17 appeared first on MySQL Performance Blog.

Read the article in full here: http://www.mysqlperformanceblog.com/2013/05/22/percona-mysql-university-portland-june-17-2013/

Strategy: Stop Using Linked-Lists

What data structure is more sacred than the link list? If we get rid of it what silly interview questions would we use instead? But not using linked-lists is exactly what Aater Suleman recommends in Should you ever use Linked-Lists?

In The Secret To 10 Million Concurrent Connections one of the important strategies is not scribbling data all over memory via pointers because following pointers increases cache misses which reduces performance. And there’s nothing more iconic of pointers than the link list.

Here are Aeter’s reasons to be anti-linked-list:

Read the article in full here: http://feedproxy.google.com/~r/HighScalability/~3/-r2Tk8qqWsM/strategy-stop-using-linked-lists.html

MySQL and the SSB – Part 2 – MyISAM vs InnoDB low concurrency

This blog post is part two in what is now a continuing series on the Star Schema Benchmark.

In my previous blog post I compared MySQL 5.5.30 to MySQL 5.6.10, both with default settings using only the InnoDB storage engine.  In my testing I discovered that innodb_old_blocks_time had an effect on performance of the benchmark.  There was some discussion in the comments and I promised to follow up with more SSB tests at a later date.

I also promised more low concurrency SSB tests when Peter blogged about the importance of performance at low concurrency.

The SSB
The SSB tests a database’s ability to optimize queries for a star schema. A star schema presents some unique challenge to the database optimizer. The SSB benchmark consists of four sets of queries. Each set is known as a “flight”. I have labeled each query as Q{FLIGHT_NUMBER}.{QUERY_NUMBER}. In general, each flight examines different time periods or different regions. The flights represent the type of investigations and drill-downs that are common in OLAP analysis.

Each query in each flight (Q1.1 for example) is tested with a cold buffer pool. Then the query is tested again without restarting the database. The first test is described as the cold test, and the second as the hot test. The database software is restarted after the hot test. All OS caches are dropped at this time as well.

These set of queries were tested on the SSB at SCALE FACTOR: 20. This means there is approximately 12GB of data in the largest table.

You can find the individual SSB query definitions in my previous blog post.

Test environment
These tests were done on a relatively fast machine with a Xeon E5-2680 (8 cores, 16 threads) with fast IO (OCZ R4 1.6TB) and 128GB memory. For the hot test, the data fits in the buffer pool and has been loaded by the cold test already. The buffer pool and adaptive hash index are cold for the cold test. All tests were done with no concurrency. The hardware for this test was provided by Adotomi. I will be blogging about raw performance of the OCZ card in another post.

Also, while it is labeled on the graphs, it is important to note that in all cases, lower times are better.

SSB Flight #1
Here you will see the start of an interesting trend. MyISAM is faster when the data is not cached (the cold run) but is slower in the hot (cached) run. I did some investigation during the testing and found that InnoDB does more IO than MyISAM when the database is cold, but uses less CPU time when the database is hot. I am only speculating (and I can investigate further), but I believe the adaptive hash index is improving performance of InnoDB significantly during the hot run, as hash indexes are faster than a b-tree index. Also accessing pages from the buffer pool should be faster than getting them from the OS cache, which is another advantage of InnoDB.

 

 

 

image009 

 

 

 

 

 

image001 

 

 

SSB Flight #2
Flight #2 is similar to Flight #1. MyISAM is faster than InnoDB when the database is cold, but the opposite is true when the database is hot.

 

 

image012 

 

 

 

 

 

image003 

 

 

 

SSB Flight #3
Here in some cases MyISAM is substantially faster than InnoDB both cold and hot.

 

 

image014 

 

 

 

 

 

image005 

 

 

 

SSB Flight #4
There is one query in this flight, Q4.3, which is faster using MyISAM than InnoDB. Like the queries in Flight #3 that are faster using MyISAM, Q4.3 examines very little data. It seems that InnoDB performs better when a larger number of rows must be joined together (Q4.1, Q4.2) but worse when small amounts of data are examined.

 

 

image016 

 

 

 

 

 

image007 

 

 

Conclusion

In some cases MyISAM is faster than InnoDB, but usually only when the buffer pool is cold. Please don’t take away that you should be using MyISAM for everything!. MyISAM may be good for raw performance, but there are limitations which MyISAM imposes that are difficult to work with.  MyISAM does not maintain checksum consistency during regular operations and is not ACID compliant. MyISAM and InnoDB may perform differently under concurrency, which this benchmark does not cover. I will make a follow-up post about concurrency in another blog post in this series. Regardless, when the working set fits in memory, InnoDB almost always performs better, at least for this workload.

Notes

MySQL version used: 5.6.11, custom compiled to remove performance_schema

For the InnoDB tests, a 64GB buffer pool was used. O_DIRECT was used so, there was no caching of data at the filesystem level. The InnoDB indexes were built using ALTER TABLE fast index creation (merge sort).

For the MyISAM tests I used a 10GB key buffer. I used ALTER TABLE DISABLE KEYS and built the keys with sort via ALTER TABLE ENABLE KEYS.
my.cnf

[mysqld]
datadir=/mnt/mysql56/data
basedir=/usr/local/mysql
socket=/var/lib/mysql/mysql.sock
user=justin
innodb_buffer_pool_size=64G
innodb_log_file_size=4G
innodb_file_per_table
innodb_stats_on_metadata=off
innodb_file_format=barracuda
innodb_log_buffer_size=32M
innodb_buffer_pool_instances=16
metadata_locks_hash_instances=32
table_open_cache_instances=8
sort_buffer_size=128k
read_rnd_buffer_size=8M
join_buffer_size=8M
default_tmp_storage_engine=myisam
tmpdir=/dev/shm
innodb_undo_logs=32
innodb_old_blocks_time=0
table_open_cache=2048
table_definition_cache=16384
innodb_flush_method=O_DIRECT
key_buffer_size=10G
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
innodb_stats_persistent
innodb_stats_auto_update=off
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

The post MySQL and the SSB – Part 2 – MyISAM vs InnoDB low concurrency appeared first on MySQL Performance Blog.

Read the article in full here: http://www.mysqlperformanceblog.com/2013/05/22/mysql-and-the-ssb-part-2-myisam-vs-innodb-low-concurrency/