The Critical Role of Kernel Developers: Insights from the CrowdStrike Outage and Its Implications for Cyber GRC

The Challenge and Importance of Kernel Development

Kernel development is a high-stakes domain where precision and expertise is paramount. A single mistake in kernel mode code can have significant repercussions, as evidenced by the recent CrowdStrike outage. This incident highlights the critical nature of kernel development and the stringent standards kernel developers must adhere to.

Developing kernel software is incredibly challenging, requiring a deep understanding of operating system internals—in this case, Windows. Kernel mode code is critical, and the dynamic interactions between your driver and the OS can lead to unforeseen issues. A single OS patch, hotfix, or update from Windows can cause your driver to crash unless all precautions are taken. Additionally, the possibility of a bug appearing within a specific build for a specific customer on a specific OS version necessitates extremely detailed and specific testing and debugging.

The Impact of a Single Developer’s Mistake

It is astonishing how one kernel developer’s error can influence not just a company’s stock but also vital sectors like healthcare and energy. The extent of the damage caused by this mistake is a stark reminder of the power and responsibility held by individual engineers. This incident underscores the immense impact that a single employee can have, even within large organizations.

Testing and Gradual Rollout: Lessons Learned

Given the scale of the disruption, the fact that CrowdStrike failed to identify the problem with its testing procedures is surprising. This raises questions about the effectiveness and thoroughness of their testing protocols and the importance of a gradual rollout of updates, utilizing control groups. Following best practices for releasing new kernel agent versions – or an agent content package – could potentially have mitigated the damage.

It is imperative to roll out the deployment gradually to different controlled groups based on different combinations of regions, OS versions, and OS hotfix/patch levels. Deploying gradually to these different groups, using validation and feedback loops before every step, ensures it is safe to proceed. This approach is key to the successful deployment of sensitive kernel updates and is also true for content or signature changes that can impact code running in kernel mode.

The Windows Agent Team’s Response

One can only imagine the tension within the Windows agent team at CrowdStrike following the incident. The moment they traced the issue back to the responsible code—likely using a `git blame`—must have been fraught with anxiety. Nevertheless, CrowdStrike acted swiftly to release and roll out a rollback update/fix, aiming to rectify the situation as promptly as possible.

Broader Implications of the Crowdstrike Outage

This outage also highlights several broader implications:

  • Internal vs. External Impact: There is a significant difference between a bug that causes an internal system failure and one that brings down external systems. The latter has far-reaching consequences, affecting multiple organizations and critical services.
  • Individual Responsibility: The potential impact of each engineer within a company, no matter its size, is immense. This incident serves as a powerful reminder of the responsibility that each developer carries.

Technical Breakdown: What Went Wrong

For those interested in the technical details, here is a simplistic reverse engineering of the CrowdStrike agent driver, CSAgent.sys:

  • The Crashing Instruction: The instruction causing the crash (BSOD) was `mov r9d, [r8]`. In assembly language, the square brackets in the `mov` instruction indicate that the value at the address pointed to by the `r8` register should be moved to `r9d`.
  • Cause of the Crash: This address was not paged, leading to a page fault and subsequently a crash, resulting in a Blue Screen of Death (BSOD). The root cause was that the `r8` register contained a garbage memory address.
  • How It Happened: The `r8` register was populated with data originating from another updated file. The assembly `lea` instruction fetched the address from that file, and after additional memory computations and dereferences, it resulted in an invalid address. When the system attempted to dereference this invalid address through `r8`, it caused the crash.

Implications for Cyber GRC Programs

The CrowdStrike outage provides several key lessons for building robust Cyber Governance, Risk, and Compliance (GRC) programs:

1. Rigorous Testing and Validation

The failure to catch the error during testing highlights the need for rigorous and comprehensive testing protocols. Cyber GRC programs must ensure that all software, especially those affecting critical systems, undergo extensive validation before deployment as part of the overall SDLC program. Implementing control groups and gradual rollouts can help identify issues before they become widespread problems.

2. Incident Response and Recovery Plans

The swift rollback update by CrowdStrike underscores the importance of having well-defined incident response and recovery plans. Cyber GRC programs should establish clear procedures for quickly addressing and mitigating the impact of software failures to minimize disruption.

This is equally crucial for organizations themselves—such as CrowdStrike’s customers—who must also have effective disaster recovery and business continuity plans. A robust plan enables organizations to recover from incidents quickly and efficiently, ensuring minimal impact on their operations.

3. Risk Assessment and Management

Understanding the potential impact of software changes on critical systems is crucial. Cyber GRC programs should incorporate thorough risk assessments into their change management processes, evaluating the possible consequences of updates and ensuring that appropriate safeguards are in place.

4. Training and Accountability

The incident emphasizes the significant responsibility of individual developers. Cyber GRC programs should invest in ongoing training for their technical teams, emphasizing best practices in secure coding and the importance of vigilance. Establishing accountability frameworks can help ensure that all team members understand the impact of their work on the broader organization.

5. Communication and Transparency

Effective communication within the organization and with stakeholders is vital during an incident. CrowdStrike’s response highlights the need for transparency in addressing issues and keeping affected parties informed. Cyber GRC programs should include communication strategies to manage stakeholder expectations and maintain trust.

Conclusion

The CrowdStrike outage serves as a poignant reminder of the critical nature of kernel development and the far-reaching consequences of errors in this domain. For Cyber GRC programs, it underscores the need for rigorous testing, robust incident response plans, thorough risk assessments, business continuity and disaster recovery planning, continuous training, and effective communication.. By integrating these lessons, organizations can enhance their resilience and better manage the complex landscape of cybersecurity risks.

As a member of the broader cybersecurity provider community, we offer our support to CrowdStrike and commend their efforts in addressing and resolving the issue swiftly. Together, we can work towards improving practices and strengthening defenses to better safeguard against future challenges.

Understanding the “Rapeflake” Attack: Lessons in Cybersecurity from the Snowflake Breach

Raise your hand if you prefer mitigation over remediation! 🤚🏻

Recent events have highlighted the critical importance of proactive cybersecurity measures, particularly in light of the “rapeflake” attack targeting Snowflake. The Snowflake breach has had a significant impact, affecting several prominent customers, including TicketMaster and Santander. Let’s delve into the specifics of the attack, the tactics, techniques, and procedures (TTPs) used, and the key takeaways for improving our cybersecurity practices.

The “Rapeflake” Attack: What Happened?

  • Targeted User Credential Theft: The attack involved a sophisticated campaign aimed at stealing user credentials. The malware, dubbed “rapeflake,” was designed to infiltrate Snowflake environments and extract usernames and passwords. Customers such as TicketMaster and Santander were among the victims.
  • Exploiting MFA Gaps: The stolen credentials included those from users who did not have Multi-Factor Authentication (MFA) configured, highlighting a significant vulnerability.
  • Compromised Demo Account: A former employee’s demo account was hacked, providing attackers with an entry point into the system.
  • Credential Sale on BreachForums: The stolen credentials quickly surfaced on the BreachForums marketplace, sold by a group known as ShinyHackers.
  • Delayed SEC Breach Notifications: Despite the severity of the breach, only some affected companies have filed SEC breach notifications to date.

Key Takeaways: Enhancing Cybersecurity Practices

Continuous Control Monitoring (CCM)

It is essential to maintain continuous visibility and proactively identify potential security risks. Key measures include:

  • Multi-Factor Authentication (MFA): Ensure MFA is enabled for all users to add an extra layer of security.
  • Principle of Least Privilege: Limit user access rights to the minimum necessary for their roles.
  • Segregation of Duties: Divide responsibilities among multiple people to reduce the risk of fraud or error.
  • Employee Termination Procedures: Implement strict procedures for terminating access promptly when employees leave the organization, to prevent the risks orphan users pose.

User Access Reviews (UARs)

Conduct continuous reviews to identify and address excessive permissions, dormant accounts, and orphaned users (accounts belonging to terminated employees). These reviews can help surface potential issues before they escalate into breaches.

The Moral of the Story

The Snowflake breach underscores the need for automated regimens to proactively monitor and mitigate security controls. It is astounding that many highly respected companies still lack these measures. By adopting a proactive approach, we can detect and stop attacks before they happen, ensuring a safer and more secure environment for everyone.

For more about taking a proactive approach to cybersecurity, check out our most recent blog on adopting “Shift Left” vs. “Shift Right” practices.

Cypago Panoramic Visibility: Bringing On-Premise Support for a Truly Hybrid & Multi-Cloud Cyber GRC Automation Solution

In today’s complex enterprise environment, data is siloed and distributed between many different environments – including cloud and on-premise. Moreover, mature companies typically have hundreds of SaaS applications. Cypago consolidates and guarantees full coverage of your entire business IT environment – so you have the full picture across cloud, SaaS and on-premise. Allow me to introduce Cypago’s panoramic visibility feature: the cornerstone of a unified, tailored Cyber GRC Automation (CGA) solution provisioning full coverage of the entire enterprise/company IT environment, integrating with both cloud and on-premise systems.

Screenshot of Cypago on-premise support feature

A Distinctive Approach to Multi-cloud and Hybrid Environments

Cypago excels in the realm of cyber GRC, bringing a wealth of expertise to the table. We serve enterprise customers who operate within major cloud environments such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, use a wide variety of SaaS applications, or have on-premise infrastructures and tools. Our strength lies in seamless integration. We collaborate with a diverse array of environments, tools, and systems. Whether you’ve chosen a hybrid environment or fully embraced cloud solutions, Cypago is there to support you. Our integrations extend across all tools and environments, empowering you to achieve comprehensive cyber GRC throughout your operations. By leveraging Cypago, you not only enhance your cybersecurity posture but also maximize the return on investment for your chosen tools.

Importantly, our support extends cloud-native environments, encapsulating Cloud, SaaS, and on-premise integrations, as well as various systems. It is paramount to emphasize that our expertise lies in collecting, analyzing, and correlating data from a wide spectrum of sources, rather than focusing solely on the cloud. This encompasses two crucial dimensions of “cloud support”:

  1. Cloud Providers: AWS, GCP, and Azure are integral parts of our comprehensive support network.
  2. SaaS Tools : We embrace an extensive array of SaaS tools, encompassing development tools such as Github, Terraform, and Jenkins, along with essential platforms like ticketing systems (e.g., Jira), HRIS, XDR/EPP (e.g., CrowdStrike), vulnerability scanning platforms, IdP solutions like Okta, and numerous other SaaS tools.

Notably, our dedication to on-premise support remains resolute, ensuring that your organization’s on-site systems, data, and configuration are seamlessly integrated into our holistic approach. This comprehensive approach ensures that your enterprise can harness the synergies of various technological dimensions, enabling elevated capabilities and insights across the board.

Setting a New Benchmark

What sets Cypago apart is our steadfast commitment to offering a hybrid and multi-cloud solution, addressing the unique needs of businesses that embrace the best features of both cloud and on-premise paradigms. As such, Connectors allows customers to seamlessly integrate their cloud and on-premise systems into the Cypago platform in order to centrally visualize and enforce policies and controls and achieve a 360-degree view of security and compliance. Unlike vendors that cannot support the complex use cases presented by enterprise companies, and/or offer limited visibility and enforcement, Cypago emerges as the steadfast collaborator in achieving equilibrium between these two paradigms. As a result, Cypago provides a panoramic understanding of our customer’s entire Cyber GRC posture, across their hybrid IT and multi-cloud environments.

The Pitfalls of Partial Visibility: The Crucial Role of Comprehensive Security and Compliance

In today’s complex digital landscape, the importance of security and compliance cannot be overstated. As businesses navigate an interconnected web of systems and data sources, the need for holistic visibility has never been more evident. However, relying on partial visibility without comprehensive coverage can not only hinder operational efficiency but also pose significant risks to security and compliance.

Partial visibility, unfortunately, often leads to a cascade of issues. The absence of a complete and unified picture results in manual interventions, leaving security teams grappling with fragmented data and incomplete insights. This, in turn, gives rise to false positives and negatives, which undermines the effectiveness of continuous control monitoring and testing across the business. Furthermore, a mere partial view falls short of fulfilling compliance requirements. For true compliance, a comprehensive overview is imperative, as regulatory standards and other voluntary frameworks demand a holistic understanding and monitoring of an organization’s data landscape.

Cypago emerges as a beacon of innovation in this landscape, intelligently bridging the gaps left by partial visibility. The platform’s prowess lies in its ability to intelligently analyze and correlate data across diverse systems — from multiple clouds to on-premise to SaaS applications – utilizing proprietary engines designed for analysis and correlation. By seamlessly combining, cross-checking, and cross-validating data, Cypago breaks down data silos that inhibit comprehensive insights. This approach not only empowers organizations with a unified view but also generates unique insights that would otherwise remain hidden amidst fragmented data.

Take User Access Review (UAR) as an example. To effectively implement this control, inspection is required across both HR records (which can be stored on an on-premise HRIS, for instance) and system users and logs (which can be stored anywhere). Similarly, other controls may necessitate scrutiny of both ticketing systems (that can be managed in an SaaS application, for instance) and code pull requests, that can be stored and managed on-premises. Cypago’s methodical approach ensures that no stone is left unturned, enabling true continuous monitoring for security and compliance. In a landscape where fragmented data can lead to substantial vulnerabilities, Cypago emerges as the solution that reshapes visibility from a piecemeal perspective to a holistic vantage point.

An All-Inclusive Hybrid IT Solution with On-Premise Support

In embracing the ever-evolving IT landscape, we not only comprehend but also address distinct and intricate requirements. Organizations often require the agility of cloud solutions while upholding stringent control over sensitive data within on-premise environments. Cypago’s unwavering commitment extends to these enterprises through our provision of adaptable on-premise solutions. This dedication guarantees that the multifaceted advantages of our hybrid IT solution are accessible across all tiers of business operations.

Diverging from traditional vendors who provide off-the-shelf solutions designed only for partial readiness, for addressing specific compliance frameworks, or for basic use cases, Cypago distinguishes itself by delivering a comprehensive Cyber GRC solution meticulously tailored to the unique needs of organizations, regardless of size or complexity. Our groundbreaking Cypago Connectors empower organizations to seamlessly integrate their cloud and on-premise systems, while maintaining an optimal level of control and security aligned with their discerning requirements.

Features

Seamlessly integrate your cloud and on-premise systems while maintaining optimal control and security, ensuring a panoramic understanding of your entire environment. Our innovative connectors facilitate fluid communication and data aggregation, all within a comprehensively tailored CGA solution.

Bridging the Divide Between Cloud, SaaS and On-Premise

It’s important to understand that without Cypago, achieving seamless interconnection among on-premise tools is not possible. In addition, correlating data between clouds, SaaS, and on-premises was a missing capability that was nowhere to be found in any other platform. Until now. This is where Cypago Connectors shine. Our connectors offer seamless integrations with on-premise tools like Jira Server, GitLab Enterprise, Splunk, ELK, Jenkins, SQL server, and MongoDB, ensuring the cohesion and operational efficiency of your hybrid infrastructure.

Flexible Deployment Possibilities

We understand that every organization has its own distinct qualities and needs. To cater to this, we provide a variety of adaptable deployment choices for our connectors. Whether you decide to incorporate them with Kubernetes or select a simple Docker container, our aim is to harmonize effortlessly with your favored infrastructure. This guarantees a smooth and effective setup process, all while maintaining a lightweight, agent-free, sensor-free approach without any complications.

Strengthening Your Security

Security stands as a bedrock principle of our approach. Cypago Connectors have been meticulously designed to align with the most stringent security best practices. Our connector software operates solely through outbound communication, eliminating the necessity for opening any inbound firewall rules and ensuring your network remains secure from potential threats. Moreover, it does not disrupt your organization’s pre-existing security policies. Outbound communication exclusively traverses your firewalls, overseen by your security teams. This distinctive approach guarantees the impregnability of your data against external threats while enabling controlled interaction with the external world.

Embrace the Future with Cypago

Cypago’s comprehensive platform offers unparalleled visibility and enforcement into an organization’s security and compliance posture across hybrid environments, multi-cloud environments, and on-premises. By actively monitoring security and compliance controls, such as access control, confidentiality, SDLC and business continuity controls, Cypago automatically and continuously identifies security and compliance gaps and empowers Operations teams to swiftly address gaps through alerts, notifications and integrated task and ticket management. This functionality also enables the provision of control status to auditors, serving as evidence of adherence to voluntary standards and industry regulations. The platform’s ability to establish connections throughout the infrastructure and tool landscape enhances its efficacy, facilitating a thorough assessment of control implementation. This evaluation identifies potential security and compliance shortfalls, ensuring that desired controls are not only established but effectively maintained.

Cypago’s hybrid IT coverage alleviates a major concern for CISOs: the fear of undiscovered vulnerabilities that could lead to breaches or audit failures. With Cypago, these apprehensions can be put to rest as organizations proactively safeguard their digital landscapes. We invite you to join us in embracing the forefront of CGA for all IT environments with Cypago; schedule a demo today.

Why is Risk Management important?

Why is it important?

Ensuring effective risk management is vital for your business’s smooth operation and success and for maintaining security and compliance with standards such as ISO, SOC, NIST, and many more. Automated risk management can efficiently handle the complexity of risk management processes, saving time and reducing human errors.

What is compliance risk management?

Compliance risk management refers to identifying, assessing, and controlling the potential risks associated with non-compliance with laws, regulations, standards, and policies applicable to a particular business or industry. While true for multiple operational aspects, managing cybersecurity risks is one of the most challenging and evolving fields of Risk Management. The goal of compliance risk management in this respect is to ensure that an organization operates within boundaries minimizing the potential for negative information security and privacy consequences. A compliance risk management policy should be integrated into an organization’s overall risk management framework to ensure it is aligned with its strategic goals and objectives.

What are the main steps in risk management?

  1. Risk Identification
    The initial step in effective risk management is identifying which risks apply to your business. It involves considering both business and IT assets, threats, and vulnerabilities. In essence, risk
    can be defined as the possibility of harm occurring when a threat exploits a vulnerability. Alternatively, risk can be viewed as the point at which assets, threats, and vulnerabilities intersect.
  2. Risk Analysis/Assessment/Evaluation
    Once risks have been identified, the next crucial step in your compliance risk management plan is to conduct a comprehensive analysis, measuring, assessment, or scoring of each of the identified risks. This involves giving meaning to each risk, taking into account factors such as the likelihood and impact of the risk, the expected loss in the event of the risk happening, and the probability of the risk. By analyzing these factors, we can define the characteristics of each risk and produce a risk “bottom line,” such as a score, number, or price. This information serves as crucial input for the risk management expert in making informed decisions and taking appropriate actions in the next step. Different analytical methods can be applied, including qualitative or quantitative risk analysis, which we’ll delve into in the next post, where I’ll explain the differences and guide you on how to perform a thorough cyber risk analysis.
  3. Risk Treatment
    Once the risks have been identified, analyzed, and fully comprehended, it’s time to take action – this is where risk treatment comes into play. Here are the available options for each risk:

    • Avoid – This approach involves eliminating the risk and for instance, modifying your plans or implementation to eliminate the likelihood or impact of the risk. This means there will be no risk whatsoever.
    • Mitigate (reduce) – This method entails taking action to reduce the likelihood or impact of the risk. One effective method is defining and monitoring security controls. Accept – By choosing to accept you acknowledge that the risk can happen and do nothing to prevent it. You may wonder when this would be advisable. An instance is when mitigating the risk is too expensive compared to the likelihood, impact, and loss expectancy, as deduced from the comprehensive risk analysis you carried out earlier.
    • Transfer – In this approach, you transfer the risk to a third party.
  4. Continuous Risk Monitoring
    Effective risk management is an ongoing and dynamic process that demands consistent attention. Once risks have been reduced through the implementation of mitigation strategies and controls, it becomes imperative to monitor them regularly. To achieve this, updating the risk, registering, and testing the effectiveness of processes should be a regular practice.

This article provides an overview of the key steps involved in risk management for businesses. The initial step is to identify risks that are relevant to the business, considering both business and IT assets, threats, and vulnerabilities. Once risks have been identified, a comprehensive analysis should be conducted, measuring factors such as the likelihood and impact of the risk. The next step is risk treatment, where available options include avoiding the risk, reducing the likelihood or impact, accepting the risk, or transferring it to a third party. Finally, ongoing risk monitoring is crucial to ensure that risk management remains effective and dynamic. We emphasize the importance of effective risk management for business success, security, and compliance with industry standards.

If you have any questions or comments about any of the above, please feel free to contact us.