From refactor to data race in Swift
During a refactoring process to convert previously sequential functionality into concurrent processing, I encountered a classic problem: ensuring record uniqueness when multiple tasks attempt to create events simultaneously. While the goal was to improve performance by processing thousands of events in parallel using Swift, the transition exposed a typical concurrency challenge: avoiding duplicates and maintaining data integrity under simultaneous access.
Actor with array
private actor EventsActor {
private var events: [SportEventModel] = []
func getOrCreate(
name: String,
cityId: UUID,
build: () async throws -> EventModel
) async throws -> EventModel {
if let event = events.first(
where: {
$0.normalizedName == name
&& $0.$city.id == cityId
}
) { return event }
let event = try await build()
events.append(event)
return event
}
}
Advantages:
- Simplicity.
- Safety against data race conditions.
Disadvantage:
- Inefficient search for large volumes O(n).
Actor with dictionary
private actor EventsActor {
private var events: [String: EventModel] = [:]
func getOrCreate(
name: String,
cityId: UUID,
build: () async throws -> EventModel
) async throws -> EventModel {
let key = "\(name)\(cityId.uuidString)"
if let event = events[key] {
return event
}
let event = try await build()
events[key] = event
return event
}
}
Advantages:
- Fast search and insertion O(1).
- Ideal for large data volumes.
Bug:
- Logical data race condition
Logical data race condition
When multiple concurrent tasks attempt to create the same event, they can all check that it doesnât exist and proceed to create it simultaneously.
Only one of the resulting instances gets stored, while the rest become âorphaned,â causing inconsistencies and broken references in other data structures.
For example: two concurrent tasks check if an event exists. Both see that it doesnât, both create it, but only one survives in the dictionary; the other reference is now lost.
Solution
To prevent this, itâs necessary to serialize not only access but also creation by key:
If thereâs already a creation in progress for that key, concurrent tasks must wait for the result of the first one, ensuring that all of them share exactly the same resource.
private actor EventsActor {
private var events: [String: EventModel] = [:]
private var builds: [String: Task<EventModel, Error>] = [:]
func getOrCreate(
name: String,
cityId: UUID,
build: @Sendable @escaping () async throws -> EventModel
) async throws -> EventModel {
let key = "\(name)\(cityId.uuidString)"
if let event = events[key] {
return event
}
if let building = builds[key] {
return try await building.value
}
let buildTask = Task {
try await build()
}
builds[key] = buildTask
let event = try await buildTask.value
events[key] = event
builds.removeValue(forKey: key)
return event
}
}
Lessons learned
- An actor alone doesnât prevent logical data race conditions of the âcheck-then-actâ type.
- In concurrent scenarios, serializing resource construction by key is fundamental to maintaining data integrity.
- Itâs essential to test under load and concurrent scenarios, not just in sequential mode.
Keep coding, keep running đââïž