Choosing List/Set/Map in Production
Collections are performance and correctness decisions
In production, choosing a collection is not a style preference. It defines:
- time complexity under load
- memory footprint and GC pressure
- ordering guarantees
- correctness when duplicates, equality, or concurrency are involved
Start with access patterns (not with habits)
Before choosing a collection, answer:
- Do you need duplicates?
- Do you need fast contains()?
- Do you need key-based lookup?
- Do you rely on stable iteration order?
- Will this be accessed concurrently?
- What is the expected size (10, 1k, 1M)?
List: ordered, duplicates allowed
Use a List when you care about order, duplicates are allowed, and you mostly iterate or index by position.
Production tips for List
ArrayListis the default for most cases.LinkedListis rarely a win in real systems (poor locality, high overhead).- Random access is fast for ArrayList (O(1)).
- contains() is O(n) for both ArrayList and LinkedList.
Set: uniqueness by equals/hashCode
Use a Set when you need uniqueness and fast membership checks. But understand: Set relies on equals/hashCode.
Production tip: equality defines uniqueness
If your elements have broken equals/hashCode or mutable equality fields, your Set will behave incorrectly.
Common Set choices
HashSet: fast membership, no guaranteed iteration order.LinkedHashSet: preserves insertion order (extra memory cost).TreeSet: sorted order, O(log n), requires comparator/Comparable.
Map: key-based lookup
Use a Map when you want to look up values by key. In production, Map is often the most important collection because it becomes:
- in-memory cache
- dedup index
- aggregation structure
Map choices
HashMap: default, fast average-case.LinkedHashMap: stable iteration order, useful for LRU-like patterns.TreeMap: sorted keys, O(log n).ConcurrentHashMap: concurrent access without full locking.
Ordering and determinism
Many production bugs are caused by assuming order where there is none.
- HashSet and HashMap do not guarantee iteration order.
- Ordering may appear stable in dev and change in prod due to different hash seeds or JVM versions.
Example: do not depend on HashMap iteration order
Map<String, Integer> m = new HashMap<>();
m.put("b", 2);
m.put("a", 1);
// Never assume output order
for (var e : m.entrySet()) {
System.out.println(e.getKey());
}
When to use LinkedHashMap/LinkedHashSet
If order matters for output stability (e.g., generating deterministic JSON for caching or tests), use LinkedHashMap/LinkedHashSet.
Memory and GC costs
Hash-based collections have overhead:
- buckets/arrays
- node objects
- references
In large in-memory workloads, choosing HashMap vs specialized structures impacts GC heavily. Always measure when sizes become large.
Concurrency: choose explicit concurrent collections
Do not use HashMap with manual synchronization in ad-hoc ways. Prefer standard concurrent structures:
- ConcurrentHashMap for concurrent maps
- CopyOnWriteArrayList for mostly-read lists (rarely for high-write)
- BlockingQueue for producer-consumer pipelines
Production failure scenario
A service uses ArrayList and repeatedly calls contains() on a list of 100k elements for membership checks. Under load, CPU spikes and latency increases. Fix: use HashSet for membership checks.
Practical decision table
- Need duplicates + order: ArrayList
- Need uniqueness + fast contains: HashSet
- Need deterministic iteration order: LinkedHashSet/LinkedHashMap
- Need sorted order: TreeSet/TreeMap
- Need key-based lookup: HashMap
- Need concurrent key-based lookup: ConcurrentHashMap
Checklist
- Pick based on access patterns and size, not habit.
- Do not rely on HashMap/HashSet order.
- Understand equals/hashCode semantics for Set/Map keys.
- Use LinkedHash* when deterministic iteration matters.
- Use concurrent collections for multi-threaded access.
- Measure memory and CPU when collections become large.
Final principle
In production, the wrong collection choice is a hidden performance bug waiting to surface. Choose explicitly.