deduplication
watermark - watermark 까지 (참고로 이전 date 는 중복제거가 안 될 수 있음)
without - sate 되는대까지- Arbitary State
Since Spark 2.2, this can be done using the operationmapGroupsWithState
and the more powerful operationflatMapGroupsWithState
. Both operations allow you to apply user-defined code on grouped Datasets to update user-defined state. For more concrete details, take a look at the API documentation (Scala/Java) and the examples (Scala/Java).
위의 Operation 은 유저가원하는(임의의) state 를 만든다. 다만 그 State 가 어떻게 활용되는지는 별이야기가없다.
3. Unsupported!
Unsupported Operations
There are a few DataFrame/Dataset operations that are not supported with streaming DataFrames/Datasets. Some of them are as follows.
Multiple streaming aggregations (i.e. a chain of aggregations on a streaming DF) are not yet supported on streaming Datasets.
Limit and take first N rows are not supported on streaming Datasets.
Distinct operations on streaming Datasets are not supported.
Sorting operations are supported on streaming Datasets only after an aggregation and in Complete Output Mode.
Few types of outer joins on streaming Datasets are not supported. See the support matrix in the Join Operations section for more details.
In addition, there are some Dataset methods that will not work on streaming Datasets. They are actions that will immediately run queries and return results, which does not make sense on a streaming Dataset. Rather, those functionalities can be done by explicitly starting a streaming query (see the next section regarding that).
count()
- Cannot return a single count from a streaming Dataset. Instead, useds.groupBy().count()
which returns a streaming Dataset containing a running count.foreach()
- Instead useds.writeStream.foreach(...)
(see next section).show()
- Instead use the console sink (see next section).
'spark,kafka,hadoop ecosystems > apache spark' 카테고리의 다른 글
Spark Struct Streaming - output (0) | 2018.11.20 |
---|---|
Spark Struct Streaming - joins (0) | 2018.11.20 |
spark struct streaming - window operation (0) | 2018.11.20 |
Spark Struct Streaming - intro (0) | 2018.11.20 |
spark D streaming vs Spark Struct Streaming (0) | 2018.11.20 |