Braze, Inc. Status

Braze

All Systems Operational

US 01 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

REST APIs Operational

Outbound Messaging Operational

Currents Operational

US 02 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

REST APIs Operational

Outbound Messaging Operational

Currents Operational

US 03 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

REST APIs Operational

Outbound Messaging Operational

Currents Operational

US 04 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

REST APIs Operational

Outbound Messaging Operational

Currents Operational

US 05 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

Rest APIs Operational

Outbound Messaging Operational

Currents Operational

US 06 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

REST APIs Operational

Outbound Messaging Operational

Currents Operational

US 07 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

REST APIs Operational

Outbound Messaging Operational

Currents Operational

US 08 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

Rest APIs Operational

Outbound Messaging Operational

Currents Operational

EU 01 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

REST APIs Operational

Outbound Messaging Operational

Currents Operational

EU 02 Cluster Operational

Dashboard Operational

SDK Data Collection Operational

Data Processing Operational

REST APIs Operational

Outbound Messaging Operational

Currents Operational

Global Support Services Operational

Operational

Degraded Performance

Partial Outage

Major Outage

Maintenance

Past Incidents

May 1, 2024

No incidents reported today.

Apr 30, 2024

No incidents reported.

Apr 29, 2024

Issue impacting US clusters

Resolved - The overwhelming majority of customers across US 01 and US 03 have had their backlogs processed and are back to real-time data processing & message sending. All services are functioning as expected. We are considering this incident resolved.

We apologize for this incident and will provide a detailed Root Cause Analysis (RCA) report soon.
Apr 29, 23:56 EDT

Update - US01 Data Processing, Outbound Messages, and SDK Data Collection are fully operational.

US03 Data Processing and SDK Data Collection is fully operational.
We are still actively processing a backlog of Outbound Messages for a small subset of customers in US03.
Apr 29, 23:29 EDT

Update - US01 Data Processing and SDK Data Collection are fully operational.
We are still actively processing a backlog of Outbound Messages for a small subset of customers in US01.

US03 SDK Data Collection is fully operational.
We are still actively processing a backlog of Outbound Messages for a small subset of customers in US03.
We are still actively processing a backlog of Data Processing jobs in US03.
Apr 29, 21:50 EDT

Update - US08 has been marked as operational. The messaging and data processing backlogs on that cluster have been fully processed, and all other services are operational. We can consider that cluster in a "monitoring" status.
Apr 29, 19:57 EDT

Update - Providing a number of meaningful updates to US01, and US03:

Dashboards and REST API processing are fully operational in both US01 and US03.
SDK Data collection is fully operational in 03, and we are scaling up in US01.

Data Processing and Message Sending are still experiencing sporadic latency as we work through the backlogs, but all health measures are improving rapidly.
Apr 29, 19:25 EDT

Update - US06 has been marked as operational. The messaging and data processing backlogs on that cluster have been fully processed, and all other services are operational. We can consider that cluster in a "monitoring" status.
Apr 29, 18:32 EDT

Update - We are continuing to work on a fix for this issue.
Apr 29, 18:29 EDT

Update - US04 and US05 have been marked as operational. The messaging and data processing backlogs on those clusters have been fully processed, and all other services are operational. We can consider those clusters in a "monitoring" status.
Apr 29, 18:13 EDT

Update - We are actively processing backlogs of both messaging and data across all clusters. Our Database, SRE, and Networking teams are continuing to increase overall throughput as the recovery continues and individual clusters catch back up to real-time.

Currents is operational across all clusters, and has been processing all events as they are cleared from the backlogs.

At this point we have completed both backlogs in US02 and US07. We have also completed the full message sending backlog in US04, and are more than 75% through backlogs in US05 and US06. US01 and US03 are continuing to ramp their pace of recovery. The next update will provide continued status updates on backlog processing and recovery.
Apr 29, 17:16 EDT

Update - At this point, Dashboard access is available for all clusters.

We are processing through the backlog of messages to send and data to process across all clusters.

We'll continue to provide hourly updates.
Apr 29, 16:00 EDT

Update - US02 and US07 have been marked as operational. The messaging and data processing backlogs on those clusters have been fully processed.

On our larger clusters, this will take longer, and we don't yet have a cluster-by-cluster ETA, but we are tracking toward resolution.
Apr 29, 14:20 EDT

Update - We continue to see service restoration across several clusters:

Data Processing and Messaging have resumed in US05, and US07.
Apr 29, 13:59 EDT

Update - We continue to see service restoration across several clusters:

Dashboard services are resumed on US04, US05, US06, US07.
Data Processing and Messaging have resumed in US04.
Apr 29, 13:37 EDT

Update - We are seeing Dashboard access, Data Processing, and Messaging resuming in US02. There is a backlog of work to process, and once it is fully caught up, we will update the status to operational.

We are working through the rest of the US clusters and will provide updates in real-time as we have them.
Apr 29, 13:16 EDT

Update - We continue working to resolve a network issue in our US data centers.

We continue to work through checkout, and our remediation steps are showing success across various services.

Our next update will be in 30 minutes or once we have more detailed information about the resolution.
Apr 29, 13:01 EDT

Update - We continue working to resolve a network issue in our US data centers.

Senior leaders in our Engineering organization have implemented code designed to ensure that Quiet Hours are respected where required, to the extent this feature was properly configured by customers in Campaigns and Canvases, before this incident.

We have completed the restoration of services to a pilot customer successfully, and are now working through restoration across all US Clusters.

Our next update will be in 30 minutes or less.
Apr 29, 12:28 EDT

Update - We continue working to resolve a network issue in our US data centers.

We have no material update since our last post. We continue to work through restoring connectivity to those databases.

Our next update will be in 30 minutes or less.
Apr 29, 11:55 EDT

Update - We are continuing to work to resolve a network issue in our US data centers. As mentioned, the rolling restart of our database containers with Rackspace, our database hosting provider, was completed. We are now working through restoring connectivity to those databases. Senior leaders in our engineering organization are working to ensure that Quiet Hours will be respected in the countries where they are required and as configured in campaigns.

We will provide a full RCA and postmortem once this is resolved.

Our next update will be in 30 minutes or less.
Apr 29, 11:27 EDT

Update - We are continuing to work to resolve a network issue in our US data centers. The rolling restart of our database containers with Rackspace, our database hosting provider, is complete. Services are gradually returning online, and we are currently processing the backlog of data and messages accumulated during the incident.

We will provide a full RCA and postmortem once this is resolved.

Our next update will be in 30 minutes or less.
Apr 29, 10:55 EDT

Update - We are continuing to resolve a network issue in our US data centers. The rolling restart of database containers with Rackspace, our database hosting provider, is progressing and we are approximately 75% complete. Once these restarts are complete, we will begin returning services and processing data and messaging backlogs. Our next update will be in 30 minutes or less.
Apr 29, 10:25 EDT

Update - We have identified the root cause and are working to resolve a network issue in our US data centers. We are actively performing a rolling restart of database containers with Rackspace, our database hosting provider. We do not expect data loss, and further expect that all messages will be sent once the services are up and running. Our next update will be in 30 minutes or less.
Apr 29, 09:53 EDT

Update - We are continuing to work on a fix for this issue.
Apr 29, 08:59 EDT

Update - Work is ongoing by Engineers and our database provider to restore service.
Apr 29, 08:05 EDT

Update - Engineers are continuing to work alongside our Database provider to restore service.
Apr 29, 06:53 EDT

Update - Engineers are actively working with our Database provider to restore service.
Apr 29, 06:18 EDT

Identified - We have identified a third-party networking issue.
Apr 29, 05:48 EDT

Investigating - Engineers are investigating an issue impacting multiple services on all US clusters.
Apr 29, 05:41 EDT

Apr 28, 2024

No incidents reported.

Apr 27, 2024

No incidents reported.

Apr 26, 2024

Elevated errors on EU01 Dashboard and SDK data collection

Resolved - The incident has been resolved. All services are operating normally.
Apr 26, 06:52 EDT

Monitoring - We have identified the issue and services are recovering. Engineers are continuing to monitor the issue.
Apr 26, 06:36 EDT

Update - Engineers are continuing to investigate the issue.
Apr 26, 06:12 EDT

Investigating - Engineering is investigating an issue causing increased errors impacting the EU01 Dashboard and SDK Data Collection.
Apr 26, 05:47 EDT

Apr 25, 2024

REST and Dashboard Increased Errors in EU02

Resolved - REST and Dashboard have remained stable as of 17:26 ET and root caused has been identified. This incident is resolved.
Apr 25, 17:52 EDT

Monitoring - A fix has been implemented, and Engineers are monitoring the results. REST and Dashboard began normal functionality as of 17:26 ET
Apr 25, 17:33 EDT

Investigating - Engineering is currently investigating an issue causing increased errors in connecting to the EU02 Dashboard as well as REST API calls
Apr 25, 17:26 EDT

Apr 24, 2024

Latency affecting Currents

Resolved - We have processed the backlog of Currents events across all clusters, this incident is now resolved.
Apr 24, 17:27 EDT

Update - Engineering has confirmed that both EU Clusters and US02 have processed the backlog of Currents events. All other US based clusters may continue to experience latency as the backlog processes.
Apr 24, 16:47 EDT

Monitoring - Fix has been deployed to all clusters. Currents events may continue to experience latency while backlog is processed.
Apr 24, 15:45 EDT

Identified - Engineering has identified a fix and is currently deploying to all clusters.
Apr 24, 15:34 EDT

Investigating - Engineering is investigating latency affecting Currents. Currents events will be delayed due to this issue.
Apr 24, 14:45 EDT

Apr 23, 2024

Issue impacting Braze Dashboard for users in South Asia

Resolved - This issue has been resolved and all systems are operating normally
Apr 23, 19:09 EDT

Update - Engineers continue to monitor the issue. Our networking provider, Fastly, is engaged and actively investigating.
Apr 23, 15:36 EDT

Identified - Engineering teams have identified an external networking issue impacting users accessing the Dashboard from the South Asia region. During this time users may experience latency and errors accessing the dashboard.

The issue has been escalated to our networking provider Fastly.
Apr 23, 11:46 EDT

Apr 22, 2024

No incidents reported.

Apr 21, 2024

No incidents reported.

Apr 20, 2024

No incidents reported.

Apr 19, 2024

No incidents reported.

Apr 18, 2024

No incidents reported.

Apr 17, 2024

No incidents reported.