Gremlin

Gremlin’s mission is to make the internet more reliable by helping engineering teams build more resilient software. The company aims to prevent costly and reputation-damaging outages by providing a platform for chaos engineering, which involves proactively testing systems to identify and fix weaknesses before they impact customers. Gremlin’s goal is to empower engineers to safely experiment on complex systems, improve system reliability, and ultimately deliver a better customer experience.

Gremlin is recognized as a pioneer and leader in the chaos engineering market. The company has established a strong market reputation and is trusted by leading enterprises, including Fortune 500 companies, to enhance the resilience of their systems. Its platform is seen as a key tool for organizations adopting cloud computing and DevOps practices to ensure the stability and performance of their applications and infrastructure.

Offerings, Capabilities, and Integrations

Gremlin is a reliability platform that enables engineers to build more resilient software through the practice of Chaos Engineering. Gremlin’s core offering is a Software-as-a-Service (SaaS) platform that allows organizations to safely and securely inject failure into their systems to identify weaknesses before they cause customer-facing outages. This proactive approach to reliability allows companies to move beyond simply observing their systems to actively improving them. Gremlin’s capabilities include the ability to run targeted “attacks” on infrastructure, such as consuming CPU resources, injecting network latency, or terminating processes, to simulate real-world failure conditions. The platform provides a suite of tools for planning, executing, and analyzing these chaos experiments in a controlled manner. Gremlin integrates with various observability and monitoring tools, allowing teams to correlate experiments with system behavior and more easily identify the root causes of issues. This focus on proactive fault injection and reliability testing gives Gremlin a competitive edge by helping organizations reduce downtime, improve system resilience, and protect revenue.

Products and Services

Gremlin’s offerings are centered around its Enterprise Reliability Platform, which combines several key products and services to help organizations manage and improve system reliability.

  • Reliability Management: This is a comprehensive solution that allows organizations to find and fix reliability risks at scale. It includes features for running pre-built reliability tests, automating testing, and monitoring systems through integrations.
  • Chaos Engineering: This is Gremlin’s flagship offering, providing a platform for safely and securely running chaos engineering experiments to build trust in complex systems. It allows engineers to inject faults and test system robustness.
  • Fault Injection: A core technology within the platform, this allows for the safe and secure testing of system robustness by injecting failures. Gremlin offers a variety of “gremlins” or attack types, including resource, state, and network attacks.
  • Reliability Scoring: This feature enables organizations to define, measure, and monitor the reliability of their services across the enterprise, providing a standardized way to track reliability posture over time.
  • Detected Risks: The platform continuously monitors systems to identify critical reliability risks before they can cause incidents.
  • Dependency Discovery: This capability automatically identifies and allows for the testing of system dependencies.
  • Failure Flags: This feature allows for testing the resiliency of applications and serverless functions.
  • GameDay Manager: A tool to help teams effectively prepare for, run, and learn from “GameDays,” which are organized events for practicing incident response.
  • Reliability Intelligence: A newer feature that provides expert reliability advice, including custom-tailored experiment analysis and recommended remediations.
  • Private Edition: For organizations with strict security requirements, Gremlin offers a fully isolated instance of its platform that can be deployed within a private network.

Target Customers

Gremlin’s target customers are primarily engineering teams within organizations that are building and operating complex, distributed systems. These are often companies where system reliability is critical to their business operations and customer experience. Gremlin’s solutions are designed to scale from a single team to the entire enterprise. Specific market segments that Gremlin targets include:

  • SaaS Companies: These companies rely on the continuous availability of their platforms and use Gremlin to improve reliability without slowing down development.
  • Financial Services: In this sector, Gremlin helps organizations modernize their resilience practices and manage compliance with regulations that mandate high availability.
  • Retail and E-commerce: These businesses use Gremlin to eliminate revenue-impacting downtime, especially during high-traffic events.

These target customers benefit from Gremlin’s products and services by being able to proactively identify and fix reliability risks before they lead to costly outages. By using Chaos Engineering, they can build more resilient systems, gain confidence in their infrastructure, and ultimately deliver a better and more reliable experience to their end-users. Gremlin also helps these organizations improve their incident response processes and validate their disaster recovery plans.

Cloud Integrations and Marketplaces

Gremlin offers a reliability platform that integrates with major cloud providers and is available on their respective marketplaces. Gremlin’s platform is designed to work on any cloud platform that provides Linux or Windows-based hosts and has been tested on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

  • Amazon Web Services (AWS)

    Gremlin has a deep integration with AWS, offering a specific suite of tools called Gremlin for AWS. This allows Gremlin to automatically discover services running on a customer’s AWS environment to simplify the process of running reliability tests. Gremlin is available on the AWS Marketplace, offering its Reliability Management Platform for chaos engineering and reliability testing on AWS infrastructure.

  • Microsoft Azure

    Gremlin’s Chaos Engineering platform is available on the Microsoft Marketplace. The platform allows users to run chaos engineering experiments against various resources within Azure, including hosts, containers, functions, and Kubernetes primitives. Gremlin’s website also states that its chaos engineering capabilities are frequently applied to Microsoft Azure workloads to improve the reliability of applications.

  • Google Cloud

    Gremlin supports running chaos engineering experiments on Google Cloud Platform. The “Gremlin Detector Agent,” an AI agent for detecting configuration changes, is available on the Google Cloud Marketplace. Gremlin’s reliability management solution includes pre-built tests that validate services against the best reliability practices from Google Cloud, among other providers.

Key People

  • CEO & Co-founder: Kolton Andrus
  • CTO & Co-founder: Matthew Fornaciari
  • Chief Strategy Officer: Lelah Manz
  • VP of People: Ana Schrank
  • VP of Engineering: Adam Toy
  • VP of Marketing: Josh Mervin

Key Facts

  • Headquarters: Covina, CA
  • Number of Employees: 65-75
  • Annual Revenue: Approximately $35 million
  • Parent Company: None
  • Subsidiary Companies: None
  • Publicly Listed: No

Analyst Recognition

Gartner includes Gremlin in its “Chaos Engineering Tools” market category. Gartner defines this category as technologies that utilize experimental and potentially destructive failure testing to uncover weaknesses within complex systems.

There is no information available to indicate that Forrester, IDC, or Everest Group include Gremlin in any of their specific technology categories.

Gremlin

Related articles

No results found.

Enter a search