Jonathan Boccara's blog

How to (std::)find something efficiently with the STL

Published January 16, 2017 - 5 Comments

Daily C++

This series of posts aims at covering all there is to know in the STL (and even sligthly beyond) about searching.

Even though the need for searching for something in a collection is quite a simple concept to comprehend, there are many things to say to cover the topic thoroughly. Even if we’ll remain focused on how to practically accomplish what you need in C++, and won’t dive into pure algorithmics too much.

For this reason, we’ll break this topic into 3 posts:

  • How to (std::)find something efficiently with the STL: covering classical STL algorithms for performing searches on ranges of elements,
  • Searching in an STL container: how to perform efficient and correct searches when you directly have access to an STL container, as opposed to a simple range,

This post shows how to search in a range. We’ll stick with the standard version of the STL and consider a range as represented by 2 iterators. However, all the following also applies to range libraries.

As we will see in more details in a dedicated post (scheduled Feb 07), the STL can be viewed as split into 2 parts: the part that operates on SORTED elements, and the one that operates on elements that are NOT SORTED.

This difference has 2 consequences for searching:

  • A look up in a SORTED collection is very fast, typically in logarithmic time, while a look up in a NOT SORTED collection is typically in linear time.
  • All methods shown on SORTED ranges compare values according to equivalence (comparing with <), and those on NOT SORTED ranges compare values according to equality (comparing with ==).

This post will show how to express the 3 following questions in C++, for a given value searched a range:

  • Is it there?
  • Where is it?
  • Where should it be (for a sorted range)?

Is it there?

On elements that are NOT SORTED

This question can be expressed with std::find, combined with a comparison with the end of the range:

vector<int> v = ... // v filled with values
if (std::find(v.begin(), v.end(), 42) != v.end())
{
    ...

Note that the question “Is it there?” can also be expressed by std::count:

vector<int> v = ... // v filled with values
if (std::count(v.begin(), v.end(), 42))
{
    ...

The returned value is implicitly converted to a bool in the if statement: here it evaluates to true if there is at least one element equal to 42 in the range.

The std::count method has advantages and drawbacks compared to std::find:

Advantages of std::count:

  • std::count avoids the comparison with the end operator.

Drawbacks of std::count:

  • std::count traverses the whole collection, while std::find stops at the first element equal to the searched value,
  • std::find arguably better expresses that you are looking for something.

For these reasons, std::find is more generally used for this need.

Note
To check for the presence of an element satisfying a predicate instead of being equal to a value, use std::count_if, std::find_if and std::find_if_not, that should be self-explanatory. This holds for all the other usages of std::count and std::find throughout this post.

On SORTED elements

The algorithm to use is std::binary_search, that directly returns a bool representing whether the searched value has equivalent elements in the collection.

std::set<int> numbers = // sorted elements
bool is42InThere = std::binary_search(numbers.begin(), numbers.end(), 42);

Where is it?

More precisely, we want to obtain iterators pointing to the occurences of the searched elements.

On elements that are NOT SORTED

Use std::find. It will return the iterator pointing to the first element equal to the searched value, or the end of the collection if the value has not been found.

std::vector<int> numbers = ...
auto searchResult = std::find(numbers.begin(), numbers.end(), 42);

if (searchResult != numbers.end())
{
    ...

On SORTED elements

Note on std::find for SORTED elements:
The STL has no algorithm as straightforward as std::find for sorted collections. But std::find is not really made for sorted collections because it uses equality and not equivalence, and it operates in linear time and not logarithmic time.
Now for a given collection, if you are sure that for the type of your elements equality is the same as equivalence, now and in the future, and that you are prepared to pay the linear time, std::find will get you the correct result, and you’ll benefit from its straightforward interface. But in the general case, keep in mind that it is not designed for operating on a sorted range.

The algorithm to use here is rather std::equal_range (you thought it was std::lower_bound? Read on to the next section to see why it isn’t). Here is its prototype:

template< class ForwardIt, class T >
std::pair<ForwardIt,ForwardIt> equal_range( ForwardIt first, ForwardIt last, const T& value );

std::equal_range returns the range of elements equivalent to the searched value. The range represented by an std::pair of iterators pointing inside the collection. The 2 iterators of the pair represent the first and the past-the-end elements of the subrange of elements in the range that are equivalent to the searched value.

However its interface is somewhat clumsy to use:

std::vector<int> v = {3, 7, 3, 11, 3, 3, 2};
sort(v.begin(), v.end());

// equal_range, attempt 1: natively clumsy
std::pair<std::vector<int>::iterator, std::vector<int>::iterator> range1 = equal_range(v.begin(), v.end(), 3);
std::for_each(range1.first, range1.second, doSomething);

A typedef or using is typically used to make it lighter:

std::vector<int> v = {3, 7, 3, 11, 3, 3, 2};
sort(v.begin(), v.end());

using IteratorPair = std::pair<std::vector<int>::iterator, std::vector<int>::iterator>;

// equal_range, attempt 2: with the classical typedef
IteratorPair range2 = equal_range(v.begin(), v.end(), 3);
std::for_each(range2.first, range2.second, doSomething);

Attempt 2 is indeed less of a mouthful, but there is still a fundamental problem left: levels of abstractions are not respected, which is contrary to the this important principle seen in a dedicated post. Indeed the pair forces us to write code in term of “first” and “second” when manipulating something returned by equal_range, whereas it should be a range. And a range should be expressed in terms of “begin” and “end”. On top on making code less natural, this becomes a real problem when you want to use this range in generic code.

To fix this, we can use a class to wrap the pair of iterators returned by std::equal_range into an object that has the semantics of a range:

template<typename Container>
class Range
{
public:
    Range(std::pair<typename Container::iterator, typename Container::iterator> range)
    : m_begin(range.first), m_end(range.second)
    {}
    typename Container::iterator begin() { return m_begin; }
    typename Container::iterator end() { return m_end; }
 
private:
    typename Container::iterator m_begin;
    typename Container::iterator m_end;
};

This sort of class exists in ranges libraries such as Boost.Ranges or range-v3. If you go see their implementation code (here for boost and here for range-v3) you’ll see they contain way more than the simple wrapper above, that is here just to get the point across rather than be used in production code).

This effectively lifts a pair of iterators to the level of abstraction of a range.

Note that without the wrapper, std::begin and std::end cannot be used on the result of std::equal_range, even if it is … a range! The wrapper fixes this problem.

It can be used the following way:

std::vector<int> v = {3, 7, 3, 11, 3, 3, 2};
sort(v.begin(), v.end());
 
// equal_range, attempt 3: natural al last
Range<std::vector<int>> range3 = equal_range(v.begin(), v.end(), 3);
std::for_each(range3.begin(), range3.end(), doSomething);

Whichever of the above methods you use, equal_range returns a range, so you can check its emptiness by comparing if the 2 iterators, and check its size with std::distance:

bool noElementFound = range3.begin() == range3.end();
size_t numberOfElementFound = std::distance(range3.begin(), range3.end())

Where should it be?

This question only makes sense for a sorted range, because for a not sorted range the element could be … anywhere in the range.

For a sorted range, the question is more precisely: “If it is there then where is it, and if it is not then where should it be ?”

The question can be expressed with 2 algorithms: std::lower_bound and std::upper_bound.

It is easy to understand them once you understand std::equal_range: std::lower_bound and std::upper_bound return respectively the first and the second iterator that would have been returned by std::equal_range.

So to insert a value in the range so that is before the elements equivalent to this value, use std::lower_bound to get an iterator designating the location to insert to.
And to insert a value in the range so that is after the elements equivalent to this value, use std::upper_bound to get an iterator designating the location to insert to.

Note that you generally don’t want to use std::lower_boud to simply search for an element:

Contrary to std::find, you can’t simply check if the iterator returned by std::lower_bound is different from the end to know whether the element is in the collection. Indeed, if the element is not present, std::lower_bound returns the location where it should have been, not the end of the collection.
So you need to check that the returned iterator is not the end of the range AND to check that it points to an element whose value is equivalent to the one you search.

Careful: equivalent, not equal (if you don’t know yet the difference don’t worry: we’ll see it in details in a dedicated post). But if (now or in the future) this does not mean the same thing for your type, you need to write an equivalence test, typically in the form of !(a < b) && !(b < a).
And if the sorting comparator is not operator< but a custom one, you need to use the custom one. And update your code if the comparator happens to change. Clumsy. Just use std::equal_range instead.

Conclusion

Here is a table that summarizes which algorithm to use when searching for something in a range:

 

Question to express in C++ NOT SORTED SORTED
Is it there? std::find != end std::binary_search
Where is it ? std::find std::equal_range
Where should it be ? std::lower_bound
std::upper_bound

 

In the next post in this series we will see know how to search directly in a standard container, and not on a range.

 

Related articles:

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin

Comments are closed