標籤:

promethues 存儲(v1.4.1)

promethues 存儲(v1.4.1)

來自專欄技術探究

記錄指標

This is a typical set of series identifiers that are part of metric counting requests:

requests_total{path="/status", method="GET", instance=」10.0.0.1:80」}requests_total{path="/status", method="POST", instance=」10.0.0.3:80」}requests_total{path="/", method="GET", instance=」10.0.0.2:80"

Let』s simplify this representation right away: A metric name can be treated as just another label dimension — __name__ in our case. At the query level, it might be be treated specially but that doesn』t concern our way of storing it, as we will see later.

{__name__="requests_total", path="/status", method="GET", instance=」10.0.0.1:80」}{__name__="requests_total", path="/status", method="POST", instance=」10.0.0.3:80」}{__name__="requests_total", path="/", method="GET", instance=」10.0.0.2:80」}

記錄採樣數據的數據結構如下:

// Sample is a sample pair associated with a metric.type Sample struct { Metric Metric `json:"metric"` Value SampleValue `json:"value"` Timestamp Time `json:"timestamp"`}// A Metric is similar to a LabelSet, but the key difference is that a Metric is// a singleton and refers to one and only one stream of samples.type Metric LabelSet

> Each metric name plus a unique set of labels is its own time series that has a value stream associated with it.

二維模型

橫軸為時間,豎軸為指標,promethues在收集數據時,只能按照不同的指標batch寫入(所有指標都在同一個時刻被收集寫入)。於是乎會產生如下兩個問題:

- We know that we want to write in batches, but the only batches we get are vertical sets of data points across series.

- When querying data points for a series over a time window, not only would it be hard to figure out where the individual points can be found, we』d also have to read from a lot of random places on disk.

按照chunk來儲存數據,每個chunk包含1024bytes,promethues使用了壓縮演算法,每個樣本大小best case ~= 0.066bits/sample。

指標的timeSeries:

type memorySeries struct { metric model.Metric // Sorted by start time, overlapping chunk ranges are forbidden. chunkDescs []*chunk.Desc // The index (within chunkDescs above) of the first chunk.Desc that // points to a non-persisted chunk. If all chunks are persisted, then // persistWatermark == len(chunkDescs). persistWatermark int // The modification time of the series file. The zero value of time.Time // is used to mark an unknown modification time. modTime time.Time // The chunkDescs in memory might not have all the chunkDescs for the // chunks that are persisted to disk. The missing chunkDescs are all // contiguous and at the tail end. chunkDescsOffset is the index of the // chunk on disk that corresponds to the first chunk.Desc in memory. If // it is 0, the chunkDescs are all loaded. A value of -1 denotes a // special case: There are chunks on disk, but the offset to the // chunkDescs in memory is unknown. Also, in this special case, there is // no overlap between chunks on disk and chunks in memory (implying that // upon first persisting of a chunk in memory, the offset has to be // set). chunkDescsOffset int // The savedFirstTime field is used as a fallback when the // chunkDescsOffset is not 0. It can be used to save the FirstTime of the // first chunk before its chunk desc is evicted. In doubt, this field is // just set to the oldest possible timestamp. savedFirstTime model.Time // The timestamp of the last sample in this series. Needed for fast // access for federation and to ensure timestamp monotonicity during // ingestion. lastTime model.Time // The last ingested sample value. Needed for fast access for // federation. lastSampleValue model.SampleValue // Whether lastSampleValue has been set already. lastSampleValueSet bool // Whether the current head chunk has already been finished. If true, // the current head chunk must not be modified anymore. headChunkClosed bool // Whether the current head chunk is used by an iterator. In that case, // a non-closed head chunk has to be cloned before more samples are // appended. headChunkUsedByIterator bool // Whether the series is inconsistent with the last checkpoint in a way // that would require a disk seek during crash recovery. dirty bool}

寫入樣本數據:

nc (s *memorySeries) add(v model.SamplePair) (int, error) { if len(s.chunkDescs) == 0 || s.headChunkClosed { newHead := chunk.NewDesc(chunk.New(), v.Timestamp) s.chunkDescs = append(s.chunkDescs, newHead) s.headChunkClosed = false } else if s.headChunkUsedByIterator && s.head().RefCount() > 1 { // We only need to clone the head chunk if the current head // chunk was used in an iterator at all and if the refCount is // still greater than the 1 we always have because the head // chunk is not yet persisted. The latter is just an // approximation. We will still clone unnecessarily if an older // iterator using a previous version of the head chunk is still // around and keep the head chunk pinned. We needed to track // pins by version of the head chunk, which is probably not // worth the effort. chunk.Ops.WithLabelValues(chunk.Clone).Inc() // No locking needed here because a non-persisted head chunk can // not get evicted concurrently. s.head().C = s.head().C.Clone() s.headChunkUsedByIterator = false } chunks, err := s.head().Add(v) if err != nil { return 0, err } s.head().C = chunks[0] for _, c := range chunks[1:] { s.chunkDescs = append(s.chunkDescs, chunk.NewDesc(c, c.FirstTime())) } // Populate lastTime of now-closed chunks. for _, cd := range s.chunkDescs[len(s.chunkDescs)-len(chunks) : len(s.chunkDescs)-1] { cd.MaybePopulateLastTime() } s.lastTime = v.Timestamp s.lastSampleValue = v.Value s.lastSampleValueSet = true return len(chunks) - 1, nil}

目前的解決方案

> Time to take a look at how Prometheus』s current storage, let』s call it 「V2」, addresses this problem.

We create one file per time series that contains all of its samples in sequential order.

關於index

LevelDB is only used for indexes to look up series files by sets of dimensions, not for time-based lookups. We just need to find the right time series that are relevant for your query.As a simple example, we have one LevelDB index which has single label=value pairs as the keys and as the LevelDB value, the identifiers of the time series which have those label=value dimensions.

If you now query for e.g. all time series with labels foo="biz" AND bar="baz", we will do two lookups in that index: one for the key foo="biz", and one for bar="baz".

We will now have two sets of time series identifiers which we intersect (AND-style matching) to arrive at the set of time series youre interested in querying.

Only then do we actually start loading any actual time series data (not from LevelDB this time).

## 參考

- fabxc.org/tsdb/

- youtube.com/watch?

- youtube.com/watch?

- slideshare.net/FabianRe

- I dont understand why Prometheus uses leveldb to begin with though, why do you ...

- docs.google.com/present


推薦閱讀:

美國啟用射電望遠鏡陣列 監控外星生命(組圖)
雲監控與服務監控不同的6個原因
國培小語:第三講 課中評價,監控學習進程(大綱)
如何監控微營銷聊天記錄?
支付平台業務實時報表&監控

TAG:監控 |