Beryl Chen
03/28/2024, 2:23 PMGitHub
03/29/2024, 11:25 PMIS NULL
operator, they are considered NULL values following SQL language. For example, true
is returned for SELECT parse_json('{"a": null}') -> 'a' IS NULL
(before this behavior change, false
is returned). #42815
Improvements
• When Broker Load is used to load data from ORC files that contain TIMESTAMP-type data, StarRocks supports retaining microseconds in the timestamps when converting the timestamps to match its own DATETIME data type. #42348
Bug Fixes
Fixed the following issues:
• In shared-data mode, the garbage collection and thread eviction mechanisms for handling persistent indexes created on Primary Key tables cannot take effect on CN nodes. As a result, obsolete data cannot be deleted. #42241
• When users query ORC files by using Hive catalogs, the query results may be incorrect because StarRocks used to read ORC files from Hive based on mapping by position. To resolve this issue, users can set the session variable orc_use_column_names
to true
, which specifies to read ORC files from Hive based on mapping by column name. #42905
• When LDAP authentication for the AD system is adopted, logins without passwords are allowed. #42476
• When disk device names end with digits, the values of monitoring metrics remain 0s because the disk device names may be invalid after such digits are removed. #42741
StarRocks/starrocksSida Shen
04/09/2024, 6:07 PMBeryl Chen
04/11/2024, 1:30 PMBeryl Chen
04/17/2024, 6:25 PMGitHub
04/19/2024, 8:29 AMTIP
This version has been taken offline due to privilege issues in querying external tables in external catalogs such as Hive and Iceberg.
Problem: When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege.
Impact scope: This problem only affects queries on external tables in external catalogs. Other queries are not affected.
Temporary workaround: The query succeeds after the SELECT privilege on this table is granted to the user again. But SHOW GRANTS will return duplicate privilege entries. After an upgrade to v3.2.6, users can run REVOKE to remove one of the privilege entries.New Features • Supports the dict_mapping column property, which can significantly facilitate the loading process during the construction of a global dictionary, accelerating the exact COUNT DISTINCT calculation. Behavior Changes • When null values in JSON data are evaluated based on the IS NULL operator, they are considered NULL values following SQL language. For example, true is returned for SELECT parse_json('{"a": null}') -> 'a' IS NULL (before this behavior change, false is returned). #42765 Improvements • Optimized the column type unionization rules for automatic schema detection in the FILES table function. When columns with the same name but different types exist in separate files, FILES will attempt to merge them by selecting the type with the larger granularity as the final type. For example, if there are columns with the same name but of types FLOAT and INT respectively, FILES will return DOUBLE as the final type. #40959 • Primary Key tables support Size-tiered Compaction to reduce the I/O amplification. #41130 • When Broker Load is used to load data from ORC files that contain TIMESTAMP-type data, StarRocks supports retaining microseconds in the timestamps when converting the timestamps to match its own DATETIME data type. #42179 • Optimized the error messages for Routine Load. #41306 • Optimized the error messages when the FILES table function is used to convert invalid data types. #42717 Bug Fixes Fixed the following issues: • FEs fail to start after system-defined views are dropped. Dropping system-defined views is now prohibited. #43552 • BEs crash when duplicate sort key columns exist in Primary Key tables. Duplicate sort key columns are now prohibited. #43206 • An error, instead of NULL, is returned when the input value of the to_json() function is NULL. #42171 • In shared-data mode, the garbage collection and thread eviction mechanisms for handling persistent indexes created on Primary Key tables cannot take effect on CN nodes. As a result, obsolete data cannot be deleted. #41955 • In shared-data mode, an error is returned when users modify the enable_persistent_index property of a Primary Key table. #42890 • In shared-data mode, NULL values are given to columns that are not supposed to be changed when users update a Primary Key table with partial updates in column mode. #42355 • Queries cannot be rewritten with asynchronous materialized views created on logical views. #42173 • CNs crash when the Cross-cluster Data Migration Tool is used to migrate Primary Key tables to a shared-data cluster. #42260 • The partition ranges of the external catalog-based asynchronous materialized views are not consecutive. #41957 StarRocks/starrocks
GitHub
04/19/2024, 8:31 AMGitHub
04/28/2024, 11:55 AMinformation_schema
using DROP TABLE. #43556
• Users are not allowed to specify duplicate keys in the ORDER BY clause when creating a Primary Key table. #43374
Improvements
• Queries on Parquet-formatted Iceberg v2 tables support equality deletes.
Bug Fixes
Fixed the following issues:
• When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege. #44061
• str_to_map
may cause BEs to crash. #43930
• When a Routine Load job is going on, running show proc '/routine_loads'
is stuck due to deadlock. #44249
• Persistent Index of Primary Key tables may cause BEs to crash due to issues in concurrency control. #43720
• The pending_task_run_count
displayed on the page of leaderFE_IP:8030
is incorrect. The displayed number is the sum of Pending and Running tasks, not Pending tasks. In addition, the information of the metric refresh_pending
cannot be displayed using followerFE_IP:8030
. #43052
• Some SQL queries that contain CTEs may encounter the Invalid plan: PhysicalTopNOperator
error. #44185
StarRocks/starrocksGitHub
06/21/2024, 1:50 AMGitHub
06/21/2024, 12:11 PMGitHub
06/26/2024, 4:50 AMBeryl Chen
06/27/2024, 1:30 PMBeryl Chen
07/08/2024, 8:24 PMGitHub
07/11/2024, 12:23 PMBeryl Chen
07/18/2024, 4:00 PMGitHub
07/19/2024, 3:31 AMcsv.trim_space
parameter in the FILES() function, checking for illegal characters and providing reasonable prompts. #44740
• Stream Load supports using \t
and \n
as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302
Bug Fixes
Fixed the following issues:
• Schema Change failures due to file location changes caused by Tablet migration during the Schema Change process. #45517
• Cross-cluster Data Migration Tool fails to create tables in the target cluster due to control characters such as \
, \r
in the default values of fields. #47861
• Persistent bRPC failures after BE restarts. #40229
• The user_admin
role can change the root password using the ALTER USER command. #47801
• Primary key index write failures cause data write errors. #48045
Behavior Changes
• Intermediate result spilling is enabled by default when sinking data to Hive and Iceberg. #47118
• Changed the default value of the BE configuration item max_cumulative_compaction_num_singleton_deltas
to 500
. #47621
• When users create a partitioned table without specifying the bucket number, if the number of partitions exceeds 5, the rule for setting the bucket count is changed to max(2*BE or CN count, bucket number calculated based on the largest historical partition data volume)
. The previous rule was to calculate the bucket number based on the largest historical partition data volume). #47949
Downgrade notes
To downgrade a cluster from v3.3.1 or later to v3.2, users must clean all temporary tables in the cluster by following these steps:
1. Disallow users to create new temporary tables:
ADMIN SET FRONTEND CONFIG("enable_experimental_temporary_table"="false");
2. Check if there are any temporary tables in the cluster:
SELECT * FROM information_schema.temp_tables;
3. If there are temporary tables in the system, clean them up using the following command (the SYSTEM-level OPERATE privilege is required):
CLEAN TEMPORARY TABLE ON SESSION 'session';
StarRocks/starrocksBeryl Chen
07/25/2024, 1:30 PMBeryl Chen
07/26/2024, 5:05 PMGitHub
07/30/2024, 3:20 AM\t
and \n
as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302
Bug Fixes
Fixed the following issues:
• Frequent INSERT and UPDATE operations on Primary Key tables may cause write and query delays in the database. #47838
• When a Primary Key table encounters data persistence failures, the persistent index may fail to capture the error, leading to data loss and reporting the error "Insert found duplicate key". #48045
• Materialized views may report insufficient permissions when refreshed. #47561
• Materialized view reports the error "For input string" when refreshed. #46131
• During materialized view refresh, the lock is held excessively long, causing the Leader FE to be restarted by the deadlock detection script. #48256
• Queries against views with the IN clause in its definition may return inaccurate results. #47484
• Global Runtime Filter causes incorrect results. #48496
• MySQL protocol COM_CHANGE_USER
does not support conn_attr
. #47796
Behavior Changes
• When users create a non-partitioned table without specifying the bucket number, the minimum bucket number the system sets for the table is 16
(instead of 2
based on the formula 2*BE or CN count
). If users want to set a smaller bucket number when creating a small table, they must set it explicitly. #47005
StarRocks/starrocksBeryl Chen
08/02/2024, 9:52 PMGitHub
08/08/2024, 8:14 AMauto_partition_max_creation_number_per_load
• max_partition_number_per_table
• max_bucket_number_per_partition
• max_column_number_per_table
• Supports runtime optimization of table data distribution, ensuring optimization tasks do not conflict with DML operations on the table. #43747
• Added an observability interface for the global hit rate of Data Cache. #48450
• Added the SQL function array_repeat. #47862
Improvements
• Optimized the error messages for Routine Load failures due to Kafka authentication failures. #46136 #47649
• Stream Load supports using \t
and \n
as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302
• Optimized the asynchronous statistics collection method for write operators, addressing the issue of increased latency when there are many import tasks. #48162
• Added the following BE dynamic parameters to control resource hard limits during loading, reducing the impact on BE stability when writing a large number of tablets. #48495
Including:
• load_process_max_memory_hard_limit_ratio
• enable_new_load_on_memory_limit_exceeded
• Added consistency checks for Column IDs within the same table to prevent Compaction errors. #48498
• Supports persisting PIPE metadata to prevent metadata loss due to FE restarts. #48852
Bug Fixes
• The process could not end when creating a dictionary from an FE Follower. #47802
• Inconsistent information returned by the SHOW PARTITIONS command in shared-data clusters and shared-nothing clusters. #48647
• Data errors caused by incorrect type handling when loading data from JSON fields to ARRAY<BOOLEAN>
columns. #48387
• The query_id
column in information_schema.task_runs
cannot be queried. #48876
• During Backup, multiple requests for the same operation are submitted to different Brokers, causing request errors. #48856
• Downgrading to versions earlier than v3.1.11 or v3.2.4 causes Primary Key table index decompression failures, leading to query errors. #48659
Downgrade Notes
If you have used the renaming column feature, you must rename the columns to their original names before downgrading your cluster to an earlier version. You can check the audit log of your cluster after upgrading to identify any ALTER TABLE RENAME COLUMN
operations and the original names of the columns.
StarRocks/starrocksBeryl Chen
08/15/2024, 1:30 PMBeryl Chen
08/19/2024, 10:54 PMGitHub
08/23/2024, 6:13 AMBYTE_ARRAY
data with a logical_type
of JSON
in Parquet files to the JSON type in StarRocks. #49385
• Optimized error messages for Files() when Access Key ID and Secret Access Key are missing. #49090
• information_schema.columns
supports the GENERATION_EXPRESSION
field. #49734
Bug Fixes
Fixed the following issues:
• Downgrading a v3.3 shared-data cluster to v3.2 after setting the Primary Key table property "persistent_index_type" = "CLOUD_NATIVE"
causes a crash. #48149
• Exporting data to CSV files using SELECT INTO OUTFILE may cause data inconsistency. #48052
• Queries encounter failures during concurrent query execution. #48180
• Queries would hang due to a timeout in the Plan phase without exiting. #48405
• After disabling index compression for Primary Key tables in older versions and then upgrading to v3.2.9, accessing page_off
information causes an array out-of-bounds crash. #48230
• BE crash caused by concurrent execution of ADD/DROP COLUMN operations. #49355
• Queries against negative TINYINT
values in ORC format files return None
on the aarch64 architecture. #49517
• If the disk write operation fails, failures of l0
snapshots for Primary Key Persistent Index may cause data loss. #48045
• Partial Update in Column mode for Primary Key tables fails under scenarios with large-volume data updates. #49054
• BE crash caused by Fast Schema Evolution when downgrading a v3.3.0 shared-data cluster to v3.2.9. #42737
• partition_linve_nubmer
does not take effect. #49213
• The conflict between index persistence and compaction in Primary Key tables could cause clone failures. #49341
• Modifications of partition_line_number
using ALTER TABLE do not take effect. #49437
• Rewrite of CTE distinct grouping sets generates an invalid plan. #48765
• RPC failures polluted the thread pool. #49619
• authentication failure issues when loading files from AWS S3 via PIPE. #49837
Behavior Changes
• Added a check for the meta
directory in the FE startup script. If the directory does not exist, it will be automatically created. #48940
• Added a memory limit parameter load_process_max_memory_hard_limit_ratio
for data loading. If memory usage exceeds the limit, subsequent loading tasks will fail. #48495
StarRocks/starrocksGitHub
09/04/2024, 9:04 AMcount(*)
on certain tables returns NULL. #49288
• partition_linve_nubmer
does not take effect. #49213
• FE throws a tablet exception: BE disk offline, and cannot migrate tablets. #47833
StarRocks/starrocksGitHub
09/05/2024, 5:55 AMmax(partition_column)
. #49391
• Partition pruning is used to optimize query performance when the partition column is a generated column (a column that is calculated based on a native column in the table), and the query predicate filter condition includes the native column. #48692
• Supports masking authentication information for Files() and PIPE. #47629
• Introduced a new statement show proc '/global_current_queries'
to view queries running on all FE nodes. show proc '/current_queries'
only shows queries running on the current FE node. #49826
Bug Fixes
Fixed the following issues:
• The source cluster's BE nodes were mistakenly added to the current cluster when exporting data to the destination cluster via StarRocks external tables. #49323
• TINYINT data type returned NULL when StarRocks reads ORC files using select * from files
from clusters deployed on aarch64 machines. #49517
• Stream Load fails when loading JSON files containing large Integer types. #49927
• Incorrect schema is returned due to improper handling of invisible characters when users load CSV files with Files(). #49718
• An issue with temporary partition replacement in tables with multiple partition columns. #49764
Behavior Changes
• Introduced a new parameter object_storage_rename_file_request_timeout_ms
to better accommodate backup scenarios with cloud object storage. This parameter will be used as the backup timeout, with a default value of 30 seconds. #49706
• to_json
, CAST(AS MAP)
, and STRUCT AS JSON
will return NULL instead of throwing an error by default when the conversion fails. You can allow errors by setting the system variable sql_mode
to ALLOW_THROW_EXCEPTION
. #50157
StarRocks/starrocksGitHub
09/09/2024, 8:28 AMBeryl Chen
09/09/2024, 8:53 PMBeryl Chen
09/12/2024, 1:30 PM