Skip to main content

Scaling electric transport optimization with Google Cloud Run Jobs

· 9 min read
Kristoffer Wänglund
Software Engineer

This blog post describes our journey scaling Einride’s transport optimization software using Google’s serverless high-performance computing platform, Cloud Run Jobs.

eVRP example

At Einride, we develop technologies to lower the carbon footprint of the transportation sector, by enabling the shift from diesel trucks to electric and autonomous trucks. We know that 1-1 replacement of diesel trucks does not cut it - electric trucks require intelligent route and charge planning to be competitive.

A core component of Einride’s freight platform is our electric vehicle route optimization engine. This engine creates optimized assignments and charging schedules for a fleet of electric trucks and chargers, ensuring that every truck can complete its route without running out of battery.

Electric vehicle routing problems (eVRPs)

An Electric Vehicle Routing Problem (eVRP) is similar to the well-known Vehicle Routing Problem, but with the added complexity of managing the limited energy capacity of electric truck batteries and the limited slots and power available at charging stations. This is a combinatorial optimization problem, belonging to the family of NP-hard problems. Finding optimal solutions is extremely challenging, especially for larger, real-world scenarios.

To give you an idea of how challenging it can be to solve an eVRP, consider this example:

In a transport network with just 10 locations, each requiring a single shipment delivery, there are 3,628,800 different possible delivery sequences. Increase the number of locations to 20, and the number of possible sequences exceeds the number of grains of sand on Earth. How can you efficiently find a cost-effective solution while simultaneously tracking charging sessions, charger availability, and charging power for each truck? This is the challenging and engaging problem we at Einride tackle every day.

Why solving eVRPs is important

Why is it important to model and solve eVRPs? At Einride, we know this is one of the essential enablers for electric transport adoption.

The big question everyone needs to answer when going electric is: “how many trucks and chargers will I need?”. An incorrect answer can lead to costly idle trucks and chargers or, conversely, too few and missed deliveries. Answering this question for any non-trivial transportation network boils down to solving and analyzing a set of eVRPs.

How to solve eVRPs

Solving eVRPs requires an eVRP optimization engine which is capable of modeling all the essential parameters of electric truck routing, including managing charging schedules and power.

Using the optimization engine to solve a single eVRP can take a long time and require lots of computational resources. In addition, identifying the number of trucks and chargers needed can require running 100s or 1000s of eVRPs, all with slightly different input parameters.

As such, to work effectively with eVRPs, you will need a high performance computing platform capable of orchestrating many demanding optimization jobs in parallel.

Solving eVRPs in Google Cloud

At Einride, we leverage Google Cloud as the underlying platform and infrastructure for most of our machine learning, AI, and optimization models. We generally prefer serverless and managed products, as they minimize infrastructure management overhead, while enabling us to build for scale.

The challenge with using serverless infrastructure is that most applications need to be designed from the ground-up to run on serverless platforms. Our eVRP optimization engine has been in development for the last 7 years, well before most serverless platforms were around - but with some initial effort, we were able to containerize it and make it possible to run serverless.

Attempt 1: Solving eVRPs using Cloud Run and Cloud Tasks

Our initial approach to running our containerized eVRP optimization engine in Google Cloud utilized the standard serverless products available at the time: Cloud Run for computation and Cloud Tasks for job management and orchestration.

We soon learned that using Cloud Run and Cloud Tasks to orchestrate long-running optimization jobs came with some challenges. Cloud Run is primarily designed for HTTP-based workloads, requiring an HTTP request (and subsequent wait) to initiate an optimization job. However, because optimization jobs can be lengthy, we wanted an asynchronous interface for our clients.

We used a Cloud Tasks queue for this purpose: the Orchestrator (see Figure 1 below) added tasks to a Cloud Tasks queue configured to call the Orchestrator again. This allowed us to solve the eVRP asynchronously without blocking the client in a synchronous call.

Cloud Run setup with the Client, Orchestrator (Cloud Run), Cloud Tasks, Optimization Solver (Cloud Run) and Spanner.

Figure 1 illustrates the components and steps involved in creating and asynchronously solving eVRPs for the client. Let's walk through the steps:

  1. A client has an eVRP to be solved and posts it to the Orchestrator.

  2. The Orchestrator persists the eVRP in Cloud Spanner.

  3. It then analyzes the eVRP and determines whether it can be split into smaller sub-eVRPs.

  4. Based on the number of splits, it creates a corresponding number of Cloud Tasks.

  5. At this point, all server-side preparations are complete, and a Long Running Operation (LRO) is returned to the client. The client can use the LRO to poll the Orchestrator for progress updates.

  6. Cloud Tasks invokes the Orchestrator with information about the sub-eVRP to be solved.

  7. The Orchestrator retrieves the necessary information from Spanner.

  8. It then invokes the Optimization Solver with the sub-eVRP and waits for it to be solved in a blocking call. When the Optimization Solver returns, the Orchestrator persists the result in Spanner. The result can either be a Solution to the sub-eVRP or an error, containing information about why it was not possible to find a solution.

The approach based on Cloud Run and Cloud Tasks worked fairly well, and we used it in production for several years. But we gradually saw a major issue with this solution - optimization jobs would occasionally fail without any apparent reason.

Because Cloud Run's interface is HTTP-based and synchronous, network errors or any disruptions to the synchronous HTTP request between Cloud Tasks and Cloud Run would halt the ongoing optimization job. The larger the optimization job, the higher the probability of such disruptions occurring during its runtime.

It wasn't until 2023 and the launch of a new serverless product that we found a solution to these problems.

Attempt 2: Solving eVRPs using Cloud Run Jobs

In 2023, Cloud Run Jobs was launched. It builds upon the existing Cloud Run platform with one key difference: instead of running container workloads in response to HTTP requests, Cloud Run Jobs starts containers asynchronously and runs them to completion.

This simplicity is compelling: within days, we had a proof-of-concept solving eVRPs using Cloud Run Jobs up and running. Instead of relying on synchronous HTTP calls, we could extend the allowed computation time for our eVRP optimization engine, enabling us to solve even larger eVRPs with longer runtimes.

Cloud Run Jobs setup with the Client, Orchestrator (Cloud Run), Optimization Solver (Cloud Run Job), PubSub, Cloud Storage and Spanner.

Figure 2 shows the components and steps used for creating eVRPs and then solving them using Cloud Run Jobs.

  1. A client has an eVRP to be solved and posts it to the Orchestrator.

  2. The Orchestrator analyzes the eVRP and determines whether to split it into smaller sub-eVRPs.

  3. Each individual sub-eVRP is stored in Google Cloud Storage.

  4. Metadata about the eVRP is stored in Cloud Spanner for interactive searching and browsing.

  5. The Orchestrator starts a Cloud Run Job Execution for each sub-eVRP. Cloud Run Job Execution overrides inform the solver where to find the specific problem in Cloud Storage.

  6. A Long Running Operation (LRO) is returned to the client.

  7. The eVRP optimization engine runs to completion, writing either a solution or an error message to Cloud Storage.

  8. The Cloud Storage bucket emits an event for each object created.

  9. Using a PubSub push subscription, the orchestrator is notified for each file created. It specifically looks for either solutions or errors.

  10. Upon receiving a solution or error object, the Orchestrator persists this information in Spanner, enabling progress tracking, which is then relayed to the client via the LRO.

After trialing this architecture as a proof-of-concept in parallel with our previous architecture, we soon noticed that it was superior in every way.

Free from the constraints of HTTP requests, we were able to extend the container timeout for up to 24 hours. With a high-performing eVRP optimization engine, this is plenty of time to solve most reasonably sized problems.

Furthermore, because our engine uses metaheuristics to continuously search for better solutions, eliminating the HTTP-induced flakiness allowed jobs to run slightly longer, leading to a roughly 4% reduction in total driven distance due to more efficient fleet-level delivery and charging schedules.

Vehicle utilization and total driven distance for 1 min and 60 min timeout on a benchmark problem.

This reduction in driven distance allowed us to transport more shipments with the same number of vehicles, freeing up capacity for new business.

In some cases we kept the computation times short, for rapid prototyping of delivery schedules, and to manage cloud costs. Cloud Run Jobs provided the flexibility to tailor computation times to specific business needs. Overall, migrating to Cloud Run Jobs resulted in a 20% reduction in our eVRP optimization engine's cloud costs.

Learnings

This blog post demonstrated how we leveraged Cloud Run Jobs, a serverless high-performance computing platform, to scale and improve the performance of our eVRP optimization engine while simplifying architecture and reducing cloud costs.

Adopting a serverless approach and fully utilizing managed cloud services requires containerization and sometimes re-architecting your application. Combining serverless products like Cloud Run Jobs, Cloud Run, Cloud Pub/Sub, and Cloud Spanner into an effective solution might not always be straightforward. For us, some trial and error was necessary, but getting together with some colleagues in front of a whiteboard, iterating on the architecture until we got it right, was time well spent.

Range predictions for electric trucks through data standardization

· 11 min read
Jenny Eriksson
Data Scientist

The key to reliable and cost-efficient electric freight lies in accurate range modeling, also known as energy consumption modeling. In this blog post, we outline Einride's approach to collecting and processing electric truck data to implement range models that consistently achieve over 90% accuracy.

To reach international targets of reduced global warming, the transport industry needs to accelerate its transition towards electric. In order to go electric at scale you need to allow for a mixed brand fleet setup. You also need accurate range predictions for these brands. Given that different truck manufacturers use different data models makes this challenging. What if we could define a standard data format that we could map all truck data to, regardless of brand? Read on to learn how Einride is working with standardizing data to enable accurate and unbiased range models.

Examples of a range model The data used when training range predictive models has a significant impact on transport planning

Building a UI component library: How we balance brand identity and speed

· 9 min read
Filip Tammergård
Software Engineer

Whether or not to roll your own UI component library is a common dilemma when developing products with a unique brand identity. In this blog post, we shed light on how we're solving this at Einride – and lessons learned from along the way.

At Einride, our goal is to eliminate the 7% of global carbon emissions that come from road freight. We have learned that making an impact is not just about developing cutting-edge technology - you also need a unique brand to make your products stand out in the market.

At Einride, we aim to convey our unique brand identity across software and hardware products.

It can be challenging to convey a unique brand identity in digital products, especially when going through a scale-up phase where experiences and experiments are perpetually developed and launched. Development speed is vital, which means that building brand-accurate UIs must be quick and easy.

In the early days of Einride, way too much time was spent on creating the most basic UI components – such as buttons and input fields – time and time again. This considerably impacted our development speed. Buttons can cost any company a lot of money.

We knew that having a component library of ready-made and brand-aligned components would significantly speed up development. That's why we went ahead and created Einride UI – our own component library. But this led us to another challenge: How do we strike a balance between investing time in developing a unique component library, versus investing time in developing the actual product?

How do you convey a unique brand without building components from scratch?

One of Einride's development principles is: "Build what we need, not what we might need". Considering the vast amount of excellent component libraries out there we could use, is building our own component library from scratch really the most effective approach? Or is there a better way?

It's tech radar review season!

· 10 min read

It's tech radar review season at Einride! That means each engineering guild facilitates a workshop to ensure that their tech radar is up to date, relevant and useful.

In this post we explain what a tech radar is, why having one or more tech radar is useful, and how we work with tech radars at Einride.

Want to know more? Read on!

A glimpse of the tech radar review workshop board.

What is a tech radar?

Tech radar is a concept originating from ThoughtWorks, originally for visualizing emerging and declining technologies in the software industry as a whole.

It has since been adopted by product and technology companies as a way to visualize technologies that are emerging and declining locally within their organization.

Examples of other organizations who have adopted tech radar are Spotify, Zalando and CNCF.

Blips, quadrants, rings and moves

A tech radar visualizes technologies as blips on a radar. The blips are categorized into quadrants based on the type of technology, and into rings based on the phase of adoption of the technology.

Anatomy of a tech radar.

Blips

A blip represents a single, specific technology. It can be a specific programming language, such as Go and Rust, a specific SDK, or an entire framework, such as Ruby on Rails.

A blip can also be a technique, a process, or a way of working with technology, such as CI/CD, SRE or

Quadrants

A quadrant is a pie-slice of the radar that collects blips of the same kind. Most tech radars have a quadrant for programming languages, and many tech radars have a quadrant for processes and ways of working.

Rings

A ring on the radar collects blips from all quadrants in the same phase of adoption. Most tech radars have the innermost ring contain technologies that should be strongly considered for adoption, while the outermost ring contains technologies that should not be adopted.

Moves

The shape of a blip signifies the direction the blip moved in the latest review.

A round blip () means that the blip is either new on the radar, or has remained in the same place in the latest review.

A downward facing triangle blip () means that the blip moved down in the latest review - i.e. it was moved from an inner ring to an outer ring.

Conversely, an upward facing triangle blip () means that the blip moved up in the latest review - i.e. it was moved from an outer ring to an inner ring.

Why use tech radar?

Tech radar serves as a visual tool for organizing technology choices and the reasoning behind them. When used effectively, it provides a combination of quick overview and in-depth documentation of context and history.

Alignment, onboarding and transparency

Some benefits of using tech radar we've observed at Einride are: improved alignment, faster onboarding times, and increased transparency.

Alignment

In agile organizations, reorgs are bound to happen. This applies doubly to rapidly growing organizations where few teams frequently turn into many. This also means that system ownership is bound to change over time.

Aligning on core technologies and ways of working across the organization ensures that systems can change ownership between teams, without requiring teams to learn new technologies.

One of the most obvious things to align are programming languages, but in many cases there are also benefits to aligning databases, SDKs and surrounding infrastructure.

Alignment also improves knowledge sharing and solution sharing across an organization. When team A encounters a new problem and finds a solution, the probability of team B being able to re-use elements of the solution increases if they use the same underlying languages, tools and techniques.

Onboarding

New team members can speed up their onboarding by using the tech radars to quickly get an overview of the key technologies related to their discipline, and why the were adopted.

This also applies to current team members who want to improve their knowledge outside of their core discipline, such as frontend developers using the backend tech radar to master full-stack development, and backend developers using the data & ML tech radar to learn data and ML ops.

Transparency

All technologies were adopted in a context, and context changes over time. Eventually, all technology choices need to be re-evaluted and challenged, to make sure they are still effective in the current context.

By writing a short article about every blip, similar to Lightweight Architecture Design Records, tech radar provides transparency on the context and reasoning of why a technology was adopted, and why blips have moved up and down the radar.

Transparency lowers the bar for identifying when the context of a blip has changed, and better solutions that are more fit for purpose need to be considered.

How does Einride use tech radars?

Einride's mission is to design and develop intelligent technologies for movement, to reduce the emissions from road transport and offer shipper customers access to the cleanest, safest and most efficient way to ship.

Einride achieves these goals by developing technologies in several separate product areas; the three main ones being the Saga platform, which enables Einride's their Electric Freight and Autonomous Freight products.

To align and knowledge share between these separate areas, Einride uses the organizational concept of "engineering guilds". Tech radars have gradually become the primary tool for recording and visualizing technology decisions made in the guilds.

Engineering guilds at Einride

What is an engineering guild?

The common definition of an "engineering guild" in a technology organization is an organization-wide forum where developers from the same discipline share knowledge and best practices, and where technology choices and ways of working are discussed and aligned.

Frontend Guild

The first engineering guild founded at Einride was the Frontend Guild. It originally started as a Slack channel, where some of the early topics included GraphQL vs. REST for API technology, and aligning on which date and time library to prefer across the whole organization.

Backend Guild

The first guild to start having formal guild meetings was the Backend Guild. This coincided with Einride's global expansion, and some of the early topics covered were multi-region cloud architecture, and aligning on Terraform as the preferred technology for infrastructure-as-code.

Data Guild

The Data Guild soon followed, where one of the big initial topis was data mesh architecture and how to best implement. This was followed by alignment on what programming languages and SDKs to prefer for building data pipelines.

Autonomous Platform Guild

The Autonomous Platform Guild at Einride was founded in an effort to create a forum where the aspects of software engineering that are specific to autonomous vehicles could be the main focus.

This guild started later than the other guilds, and the initial focus of the meetings was to map out all the already existing best practices and ongoing technology adoption initiatives going on across the teams developing software for the Pod.

Pairing radars with engineering guilds

Each major engineering guild at Einride maintains their own tech radar. This is different from how most organizations work with tech radar, since it means that Einride has several tech radars, as opposed to a single one.

The motivation for structuring tech radars in this way is simple; it's practical! Making a tech radar is a big undertaking, and maintaining one over time even moreso. But when combining tech radar with guilds, building and keeping the radar up to date can be part of the guild's way of working with recording, formalizing, and visualizing technology decisions.

Review workshops

Most guilds keep their radars up to date as part of their ongoing work, adding blips to the radar, or updating existing ones as conclusions and decisions are made in guild meetings.

However, at least a few times per year, we've found that it's useful to review the radar together with the organization's current business strategy, as a way to identify technology gaps, and existing technologies where the underlying context and reason for choosing may have changed.

Overview of the tech radar review workshop board.

Review current business goals and challenges

To ensure that structure follows strategy, always start by looking at the business strategy and the goals that the organization is trying to achive.

This shouldn't take too much time from the workshop, but it's important to have these goals top of mind and use them to guide discussions on what to put on the tech radar and why.

Pitch new entries on the radar

The first part of the workshop focuses on identifying gaps and proposing new technologies to put on the radar. Each proposed blip is accompanied with a brief pitch of what gap the technology fills and why it's an appropriate choice to fill that gap.

Strong pitches for new technologies to put on the radar point to successful experiments from hack-days, or existing practical examples of how the technology can be effectively used to solve key use cases within the organization.

Depending on the level of attendance and engagement in the guild, there will likely be too many potential new blips to discuss, and then we've found that some variant of the Lean Coffee format tends to work well. Einride uses Miro to conduct remote-first workshops - we've found it has all the tooling we need to effectively organize lean coffee voting sessions.

Move existing entries on the radar

The second part of the workshop focuses on reviewing existing radar blips, to identify ones that should be moved up or down. This part starts with a voting session, where participarts vote on which existing entries to discuss moving.

Strong pitches for existing technologies to move up on the radar point to how the technology has been increasing in adoption since the last review, and reaffirms that it remains relevant to the current business strategy and goals.

Once again, a lean coffee format is used to discuss the blips in order of most votes.

Governance and tiebreaking

Some topics may be divisive, and there's always many ways of achieving the same goal. Each tech radar has roles of responsibility associated which act as tiebreakers and ensure that the content of the radar is consistent and business-aligned.

Summary

This post has explained the concept of tech radars and why they are a useful tool for a technology organization, with specific examples of how Einride has integrated tech radar together with the concept of guilds.

If your organization has not yet adopted tech radar or guilds, we hope that this post, and our own tech radar, can serve as inspiration on how these ideas can be applied in your organization!

note

If you want to join our team and work on technologies for sustainable freight, check out our careers page. We are a global, hybrid workplace organization with engineering offices in Austin, Stockholm and Gothenburg as of May 2022.

Introducing: einride.engineering

· 2 min read

This is einride.engineering - a technology-focused, public site with in-depth content written by engineers at Einride.

This site serves as a porthole into Einride's engineering organization, culture, values and ways of working.

Content

This site is home to Einride's engineering blog, and public Tech Radar.

Blog

The blog contains stories from Einride engineers. It differs from the insights feed on the main einride.tech site in that it has a much more narrow engineering audience, so that it can cover engineering-focused topics more in-depth.

For example, blog posts about open source software can assume that the reader is familiar with the basics of software engineering, and blog posts about ways of working can assume that the reader is already familiar with agile methods.

Tech Radar

The Tech Radar contains the latest edition of Einride's public Tech Radar. It documents Einride's current technology choices within several different engineering disciplines, and overall technology strategy,

Every blip on the radar has an accompanying article, documenting at a high level why the entry is on the radar and in which context it's meant to be used. Since these articles are public, they naturally avoid referencing internal details about Einride's infrastructure.

Summary

This post marks the beginning of this site. Stay tuned for stories from engineers across Einride's engineering organization, and check in from time to time to keep any eye on Einride's public Tech Radar!