boundless-bear-68728
03/08/2024, 10:13 PMripe-machine-72145
03/09/2024, 2:03 PMworried-agent-2446
03/10/2024, 2:42 PMclean-magazine-98135
03/11/2024, 2:42 AMrich-barista-93413
03/11/2024, 9:24 AMbreezy-honey-91751
03/11/2024, 9:29 AMtall-answer-76571
03/11/2024, 9:53 AMfancy-barista-51991
03/11/2024, 4:34 PMresource,subresource,glossary_terms,tags,owners,ownership_type,description,domain,ownership_type_urn
"urn:li:dataset:(urn:li:dataPlatform:snowflake,datahub.growth.users,PROD)",,[urn:li:glossaryTerm:Users],[urn:li:tag:HighQuality],[urn:li:corpuser:lfoe|urn:li:corpuser:jdoe],CUSTOM,"description for users table",urn:li:domain:Engineering,urn:li:ownershipType:a0e9176c-d8cf-4b11-963b-f7a1bc2333c9
"urn:li:dataset:(urn:li:dataPlatform:hive,datahub.growth.users,PROD)",first_name,[urn:li:glossaryTerm:FirstName],,,,"first_name description",
"urn:li:dataset:(urn:li:dataPlatform:hive,datahub.growth.users,PROD)",last_name,[urn:li:glossaryTerm:LastName],,,,"last_name description",
nice-dog-12741
03/11/2024, 8:23 PMsalmon-nail-53998
03/12/2024, 1:37 AMversion: 1
source: DataHub
owners:
users:
- <mailto:useremail@email.com|useremail@email.com>
url: "<https://github.com/datahub-project/datahub/|https://github.com/datahub-project/datahub/>"
nodes:
- name: Classification_demo
description: A set of terms related to Data Classification
terms:
- name: Sensitive
description: Sensitive Data
- name: Confidential
description: Confidential Data
- name: HighlyConfidential
description: Highly Confidential Data
ERROR {datahub.ingestion.run.pipeline:69} - failed to write record with workunit urn:li:glossaryTerm:Classification_demo.Confidential/mce with ('Unable to emit metadata to DataHub GMS: com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.GlossaryTermSnapshot/aspects/1/com.linkedin.common.Ownership/ownerTypes :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.GlossaryTermSnapshot/aspects/1/com.linkedin.common.Ownership/ownerTypes :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:glossaryTerm:Classification_demo.Confidential'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.GlossaryTermSnapshot/aspects/1/com.linkedin.common.Ownership/ownerTypes :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:glossaryTerm:Classification_demo.Confidential'
millions-byte-1976
03/12/2024, 2:35 AMearly-oil-14918
03/12/2024, 4:55 AMhallowed-helicopter-80392
03/12/2024, 6:15 AMblue-cartoon-10359
03/12/2024, 9:31 AMfrom datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.schema_classes import ???
graph = DataHubGraph(DatahubClientConfig(server=endpoint))
res = graph.get_aspect(dataset_urn, aspect=???)
# get summary stats for each column
col_stats = res["some key"]
red-scientist-36390
03/12/2024, 9:45 AMdamp-computer-24317
03/13/2024, 4:22 AMThis version of datahub supports report-to functionality
+ exec datahub ingest run -c /tmp/datahub/ingest/96342606-9c11-45af-be9b-a7fdbcc6f2e6/recipe.yml --report-to /tmp/datahub/ingest/96342606-9c11-45af-be9b-a7fdbcc6f2e6/ingestion_report.json
[2024-03-13 04:17:40,407] INFO {datahub.cli.ingest_cli:147} - DataHub CLI version: 0.13.0
[2024-03-13 04:17:40,507] INFO {datahub.ingestion.run.pipeline:238} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
Failed to configure the source (redshift): 1 validation error for RedshiftConfig
is_serverless
extra fields not permitted (type=value_error.extra)
rich-barista-93413
03/13/2024, 8:34 AMbland-orange-13353
03/13/2024, 8:35 AMhigh-area-68604
03/13/2024, 9:03 AMhigh-area-68604
03/13/2024, 9:05 AMhigh-area-68604
03/13/2024, 9:06 AMpurple-addition-48342
03/13/2024, 9:34 AMUpstreamClass
.
This UpstreamClass
type support setting the "`type`" (VIEW, TRANSFORM, COPY) and "`properties`", which are not shown in the UI.
Is the type somehow reflected in the UI?
I saw there is properties = {"source": "UI"}
, which is resulting in "Added manually" in the UI.
I am thinking if this properties can be used to store custom information, like "source file", or any other information
Is there is any way to display them or is there are plan for future implementation ?
Or if that field is only used internally and should not be used
Thx in advanceincalculable-sundown-8765
03/13/2024, 10:50 AMpipeline_name: my_glossary
source:
type: datahub-business-glossary
config:
file: datahub/resources/glossary/my_glossary.yaml
enable_auto_id: False
This is my my_glossary.yaml
version: 1
source: DataHub
owners:
users:
- my.name
nodes:
- id: "urn:li:glossaryNode:customer"
name: Customer
description: "Customer Glossary"
terms:
- id: "urn:li:glossaryTerm:created_at"
name: Created At
description: "Timestamp when customer first being created."
I get this error:
failed to write record with workunit urn:li:glossaryNode:customer/mce with ('Unable to emit metadata to DataHub GMS: com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.GlossaryNodeSnapshot/aspects/0/com.linkedin.glossary.GlossaryNodeInfo/customProperties :: unrecognized field found but not allowed\nERROR :: /value/com.linkedin.metadata.snapshot.GlossaryNodeSnapshot/aspects/1/com.linkedin.common.Ownership/ownerTypes :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.GlossaryNodeSnapshot/aspects/0/com.linkedin.glossary.GlossaryNodeInfo/customProperties :: unrecognized field found but not allowed\nERROR :: /value/com.linkedin.metadata.snapshot.GlossaryNodeSnapshot/aspects/1/com.linkedin.common.Ownership/ownerTypes :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:glossaryNode:customer'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.GlossaryNodeSnapshot/aspects/0/com.linkedin.glossary.GlossaryNodeInfo/customProperties :: unrecognized field found but not allowed\nERROR :: /value/com.linkedin.metadata.snapshot.GlossaryNodeSnapshot/aspects/1/com.linkedin.common.Ownership/ownerTypes :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:glossaryNode:customer'}
I believe this is coming from the owners
. I'm seeing this issue for csv enricher as well whenever I added ownership.
Datahub version: v0.12.1cuddly-dinner-641
03/13/2024, 1:10 PMinclude_metastore
flag is deprecated and will always be "false" in the future.
Isn't metastore necessary to guarantee dataset URNs are unique?quiet-computer-34771
03/13/2024, 4:29 PMlittle-painter-30105
03/13/2024, 6:45 PMldap
user name (datahub signed in user) in Datahub . How can I just update Airflow owner name in Datahub (keeping different Airflow owner in Airflow UI) ? Is there a way we can update using API calls or ingestion in Datahub?sparse-arm-36740
03/13/2024, 8:19 PMmicroscopic-twilight-7661
03/14/2024, 10:37 AMvictorious-lizard-36455
03/14/2024, 11:10 AMglamorous-area-45109
03/14/2024, 4:09 PMtransformers:
- type: "pattern_add_dataset_tags"
config:
replace_existing: true
tag_pattern:
rules:
".*common.*": ["urn:li:tag:tag:layer:common"]
".*core.*": ["urn:li:tag:layer:core"]
".*consumption.*": ["urn:li:tag:layer:consumption"]
However, this is only tagging me the tables and views inside the dataset. Is there any way to only tag the datasets and not the tables and views?
Thanks!